Jump to content

Server Admin Log/Archive 71

From Wikitech

2023-09-30

  • 19:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T343198)', diff saved to https://phabricator.wikimedia.org/P52795 and previous config saved to /var/cache/conftool/dbconfig/20230930-194448-arnaudb.json
  • 19:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 19:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 19:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T343198)', diff saved to https://phabricator.wikimedia.org/P52794 and previous config saved to /var/cache/conftool/dbconfig/20230930-194427-arnaudb.json
  • 19:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P52793 and previous config saved to /var/cache/conftool/dbconfig/20230930-192920-arnaudb.json
  • 19:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P52792 and previous config saved to /var/cache/conftool/dbconfig/20230930-191414-arnaudb.json
  • 18:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T343198)', diff saved to https://phabricator.wikimedia.org/P52791 and previous config saved to /var/cache/conftool/dbconfig/20230930-185908-arnaudb.json
  • 14:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T343198)', diff saved to https://phabricator.wikimedia.org/P52790 and previous config saved to /var/cache/conftool/dbconfig/20230930-142054-arnaudb.json
  • 14:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T343198)', diff saved to https://phabricator.wikimedia.org/P52789 and previous config saved to /var/cache/conftool/dbconfig/20230930-142017-arnaudb.json
  • 14:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P52788 and previous config saved to /var/cache/conftool/dbconfig/20230930-140510-arnaudb.json
  • 13:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P52787 and previous config saved to /var/cache/conftool/dbconfig/20230930-135004-arnaudb.json
  • 13:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T343198)', diff saved to https://phabricator.wikimedia.org/P52786 and previous config saved to /var/cache/conftool/dbconfig/20230930-133458-arnaudb.json
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
  • 08:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T343198)', diff saved to https://phabricator.wikimedia.org/P52785 and previous config saved to /var/cache/conftool/dbconfig/20230930-084720-arnaudb.json
  • 08:47 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 08:47 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 08:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T343198)', diff saved to https://phabricator.wikimedia.org/P52784 and previous config saved to /var/cache/conftool/dbconfig/20230930-084658-arnaudb.json
  • 08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P52783 and previous config saved to /var/cache/conftool/dbconfig/20230930-083152-arnaudb.json
  • 08:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P52782 and previous config saved to /var/cache/conftool/dbconfig/20230930-081645-arnaudb.json
  • 08:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T343198)', diff saved to https://phabricator.wikimedia.org/P52781 and previous config saved to /var/cache/conftool/dbconfig/20230930-080139-arnaudb.json
  • 02:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T343198)', diff saved to https://phabricator.wikimedia.org/P52780 and previous config saved to /var/cache/conftool/dbconfig/20230930-025624-arnaudb.json
  • 02:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 02:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance

2023-09-29

  • 23:41 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_esams
  • 23:40 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-upload_esams
  • 22:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 22:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 22:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52779 and previous config saved to /var/cache/conftool/dbconfig/20230929-224409-arnaudb.json
  • 22:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P52778 and previous config saved to /var/cache/conftool/dbconfig/20230929-222902-arnaudb.json
  • 22:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P52777 and previous config saved to /var/cache/conftool/dbconfig/20230929-221356-arnaudb.json
  • 21:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52776 and previous config saved to /var/cache/conftool/dbconfig/20230929-215849-arnaudb.json
  • 21:00 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_esams
  • 20:59 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_esams
  • 20:35 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_drmrs
  • 20:34 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-upload_drmrs
  • 19:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
  • 19:46 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
  • 19:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003.eqiad.wmne']
  • 19:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004.eqiad.wmne']
  • 19:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003.eqiad.wmne']
  • 19:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004.eqiad.wmne']
  • 18:55 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1023.eqiad.wmnet
  • 18:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
  • 18:43 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
  • 18:42 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1023.eqiad.wmnet
  • 18:19 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1022.eqiad.wmnet
  • 18:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
  • 17:54 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_drmrs
  • 17:54 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_drmrs
  • 17:53 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
  • 17:53 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1022.eqiad.wmnet
  • 17:08 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-upload_eqiad
  • 17:06 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_eqiad
  • 16:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52774 and previous config saved to /var/cache/conftool/dbconfig/20230929-165347-arnaudb.json
  • 16:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52773 and previous config saved to /var/cache/conftool/dbconfig/20230929-165326-arnaudb.json
  • 16:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P52772 and previous config saved to /var/cache/conftool/dbconfig/20230929-163819-arnaudb.json
  • 16:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
  • 16:27 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
  • 16:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P52771 and previous config saved to /var/cache/conftool/dbconfig/20230929-162313-arnaudb.json
  • 16:22 inflatador: bking@wdqs1016 depooling to compress JNL file T347605
  • 16:16 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 16:15 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
  • 16:15 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 16:14 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
  • 16:13 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 16:13 jiji@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 16:11 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:11 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw codfw - aborrero@cumin1001"
  • 16:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1031.eqiad.wmnet
  • 16:08 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1031.eqiad.wmnet
  • 16:08 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw codfw - aborrero@cumin1001"
  • 16:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52770 and previous config saved to /var/cache/conftool/dbconfig/20230929-160807-arnaudb.json
  • 16:06 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 15:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on restbase1031.eqiad.wmnet with reason: Upgrading BIOS
  • 15:54 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on restbase1031.eqiad.wmnet with reason: Upgrading BIOS
  • 15:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 15:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 15:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 15:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 15:48 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1028.eqiad.wmnet
  • 15:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
  • 15:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
  • 15:35 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:35 bking@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:34 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1028.eqiad.wmnet
  • 15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003.eqiad.wmne']
  • 15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1004.eqiad.wmne']
  • 15:33 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1021.eqiad.wmnet
  • 15:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
  • 15:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003.eqiad.wmne']
  • 15:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004.eqiad.wmne']
  • 15:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-master1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:23 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:23 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:22 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:21 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
  • 15:20 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1021.eqiad.wmnet
  • 15:19 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1020.eqiad.wmnet
  • 15:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
  • 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:14 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
  • 15:07 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1020.eqiad.wmnet
  • 14:55 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 14:55 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:54 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 14:54 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 14:54 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 14:54 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 14:54 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 14:53 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:53 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2027.codfw.wmnet
  • 14:53 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2027.codfw.wmnet
  • 14:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:40 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:40 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:38 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:38 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:27 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_eqiad
  • 14:27 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_eqiad
  • 14:23 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-upload_eqsin
  • 14:21 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_eqsin
  • 14:20 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:12 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:07 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:07 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:54 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1010.eqiad.wmnet with OS bookworm
  • 12:39 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab version upgrade
  • 12:37 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 12:34 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 12:18 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bookworm
  • 11:59 topranks: adjusting evpn_db BGP export filter lsw1-f3-eqiad
  • 11:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1085.eqiad.wmnet
  • 11:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1085.eqiad.wmnet
  • 11:40 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_eqsin
  • 11:40 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_eqsin
  • 11:35 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-worker1086.eqiad.wmnet
  • 11:34 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1086.eqiad.wmnet
  • 11:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52767 and previous config saved to /var/cache/conftool/dbconfig/20230929-111353-arnaudb.json
  • 11:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52766 and previous config saved to /var/cache/conftool/dbconfig/20230929-111331-arnaudb.json
  • 11:09 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab version upgrade
  • 10:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-worker1085.eqiad.wmnet with reason: Cold booting to see if it helps with RAID BBU
  • 10:58 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-worker1085.eqiad.wmnet with reason: Cold booting to see if it helps with RAID BBU
  • 10:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P52765 and previous config saved to /var/cache/conftool/dbconfig/20230929-105825-arnaudb.json
  • 10:58 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab version upgrade
  • 10:52 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab version upgrade
  • 10:49 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab version upgrade
  • 10:43 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab version upgrade
  • 10:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P52764 and previous config saved to /var/cache/conftool/dbconfig/20230929-104318-arnaudb.json
  • 10:35 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 10:35 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 10:28 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host backup1010.eqiad.wmnet with OS bookworm
  • 10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52763 and previous config saved to /var/cache/conftool/dbconfig/20230929-102812-arnaudb.json
  • 10:19 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 10:19 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 10:18 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:18 jiji@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:09 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 10:09 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: sync
  • 10:09 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: sync
  • 10:09 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 09:08 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: sync
  • 09:08 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: sync
  • 05:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52760 and previous config saved to /var/cache/conftool/dbconfig/20230929-053158-arnaudb.json
  • 05:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 05:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 05:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T343198)', diff saved to https://phabricator.wikimedia.org/P52759 and previous config saved to /var/cache/conftool/dbconfig/20230929-053136-arnaudb.json
  • 05:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P52758 and previous config saved to /var/cache/conftool/dbconfig/20230929-051630-arnaudb.json
  • 05:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P52757 and previous config saved to /var/cache/conftool/dbconfig/20230929-050123-arnaudb.json
  • 04:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T343198)', diff saved to https://phabricator.wikimedia.org/P52756 and previous config saved to /var/cache/conftool/dbconfig/20230929-044617-arnaudb.json
  • 02:57 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices2005-dev.codfw.wmnet with OS bookworm
  • 01:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52755 and previous config saved to /var/cache/conftool/dbconfig/20230929-014825-arnaudb.json
  • 01:40 ejegg: payments-wiki upgraded from c4c9b938 to d6ad0376
  • 01:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P52754 and previous config saved to /var/cache/conftool/dbconfig/20230929-013319-arnaudb.json
  • 01:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P52753 and previous config saved to /var/cache/conftool/dbconfig/20230929-011813-arnaudb.json
  • 01:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52752 and previous config saved to /var/cache/conftool/dbconfig/20230929-010306-arnaudb.json
  • 00:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1227.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1227.mgmt.eqiad.wmnet with reboot policy FORCED

2023-09-28

  • 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52751 and previous config saved to /var/cache/conftool/dbconfig/20230928-235053-arnaudb.json
  • 23:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 23:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52750 and previous config saved to /var/cache/conftool/dbconfig/20230928-235032-arnaudb.json
  • 23:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T343198)', diff saved to https://phabricator.wikimedia.org/P52749 and previous config saved to /var/cache/conftool/dbconfig/20230928-234246-arnaudb.json
  • 23:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 23:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 23:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T343198)', diff saved to https://phabricator.wikimedia.org/P52748 and previous config saved to /var/cache/conftool/dbconfig/20230928-234224-arnaudb.json
  • 23:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P52747 and previous config saved to /var/cache/conftool/dbconfig/20230928-233525-arnaudb.json
  • 23:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P52746 and previous config saved to /var/cache/conftool/dbconfig/20230928-232718-arnaudb.json
  • 23:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P52745 and previous config saved to /var/cache/conftool/dbconfig/20230928-232019-arnaudb.json
  • 23:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P52744 and previous config saved to /var/cache/conftool/dbconfig/20230928-231211-arnaudb.json
  • 23:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52743 and previous config saved to /var/cache/conftool/dbconfig/20230928-230512-arnaudb.json
  • 22:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T343198)', diff saved to https://phabricator.wikimedia.org/P52742 and previous config saved to /var/cache/conftool/dbconfig/20230928-225705-arnaudb.json
  • 22:40 wfan: payments-wiki change from c4c9b938 to 20828b07
  • 22:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2005-dev.codfw.wmnet with reason: host reimage
  • 22:02 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 22:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2005-dev.codfw.wmnet with reason: host reimage
  • 21:58 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 21:58 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
  • 21:58 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1030.eqiad.wmnet
  • 21:57 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1030.eqiad.wmnet
  • 21:57 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1027.eqiad.wmnet
  • 21:57 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1027.eqiad.wmnet
  • 21:57 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1026.eqiad.wmnet
  • 21:57 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1026.eqiad.wmnet
  • 21:56 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 21:56 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 21:56 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:56 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
  • 21:55 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 21:55 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 21:55 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 21:55 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
  • 21:54 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1024.eqiad.wmnet
  • 21:54 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1024.eqiad.wmnet
  • 21:54 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1023.eqiad.wmnet
  • 21:54 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1023.eqiad.wmnet
  • 21:54 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1022.eqiad.wmnet
  • 21:54 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1022.eqiad.wmnet
  • 21:53 wfan: payments-wiki change from 505a616d to 20828b07
  • 21:52 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 21:42 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 21:42 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 21:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices2005-dev.codfw.wmnet with OS bookworm
  • 21:30 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1028.eqiad.wmnet
  • 21:30 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1028.eqiad.wmnet
  • 21:28 bking@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
  • 21:25 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
  • 21:25 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
  • 21:14 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
  • 21:13 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 21:07 thcipriani@deploy2002: Finished scap: Backport for Drop the desktop improvements dblist group (T347444) (duration: 11m 22s)
  • 21:00 thcipriani@deploy2002: jdlrobson and thcipriani: Continuing with sync
  • 20:59 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:57 thcipriani@deploy2002: jdlrobson and thcipriani: Backport for Drop the desktop improvements dblist group (T347444) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:55 thcipriani@deploy2002: Started scap: Backport for Drop the desktop improvements dblist group (T347444)
  • 20:55 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:55 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:52 thcipriani@deploy2002: Finished scap: Backport for update sawikiquote logos (T341260), Wikimedia special project logo updates (duration: 16m 32s)
  • 20:45 thcipriani@deploy2002: anzx and jdlrobson and thcipriani: Continuing with sync
  • 20:36 thcipriani@deploy2002: anzx and jdlrobson and thcipriani: Backport for update sawikiquote logos (T341260), Wikimedia special project logo updates synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:35 thcipriani@deploy2002: Started scap: Backport for update sawikiquote logos (T341260), Wikimedia special project logo updates
  • 20:28 thcipriani@deploy2002: Finished scap: Backport for Add 'confirmed' to Wikifunctions sysop add and remove (T344261), add 'autopatrol' to Wikifunctions' functioneer group (T344085), add autopatrolled group with autopatrol right for Wikifunctions (T343946) (duration: 10m 06s)
  • 20:21 thcipriani@deploy2002: mdaniels5757 and thcipriani and terasail: Continuing with sync
  • 20:19 thcipriani@deploy2002: mdaniels5757 and thcipriani and terasail: Backport for Add 'confirmed' to Wikifunctions sysop add and remove (T344261), add 'autopatrol' to Wikifunctions' functioneer group (T344085), add autopatrolled group with autopatrol right for Wikifunctions (T343946) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:18 thcipriani@deploy2002: Started scap: Backport for Add 'confirmed' to Wikifunctions sysop add and remove (T344261), add 'autopatrol' to Wikifunctions' functioneer group (T344085), add autopatrolled group with autopatrol right for Wikifunctions (T343946)
  • 20:11 taavi: create new oathauth tables on labtestwikitech and run `taavi@cloudweb2002-dev ~ $ mwscript extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php labtestwiki`, fixes T347627
  • 20:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T347624, testing new cookbook changes) xfer categories => wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet, repooling both afterwards
  • 20:03 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:03 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:02 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 19:56 brennen@deploy2002: Finished scap: Backport for Handle SpecialPage::getDescription() returning a Message (T347620) (duration: 09m 53s)
  • 19:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer (T347624, testing new cookbook changes) xfer categories => wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet, repooling both afterwards
  • 19:50 brennen@deploy2002: matmarex and brennen: Continuing with sync
  • 19:48 brennen@deploy2002: matmarex and brennen: Backport for Handle SpecialPage::getDescription() returning a Message (T347620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:46 brennen@deploy2002: Started scap: Backport for Handle SpecialPage::getDescription() returning a Message (T347620)
  • 19:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 19:24 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 19:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 19:23 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
  • 19:14 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
  • 19:14 bking@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
  • 19:13 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
  • 19:13 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1007']
  • 19:12 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007']
  • 19:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52737 and previous config saved to /var/cache/conftool/dbconfig/20230928-190216-arnaudb.json
  • 19:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:06 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52736 and previous config saved to /var/cache/conftool/dbconfig/20230928-190154-arnaudb.json
  • 19:00 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 7 hosts matching query A:cp-upload_codfw and not P{cp2028*}
  • 19:00 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 7 hosts matching query A:cp-text_codfw and not P{cp2027*}
  • 18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P52735 and previous config saved to /var/cache/conftool/dbconfig/20230928-184648-arnaudb.json
  • 18:33 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.28 refs T345889
  • 18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P52734 and previous config saved to /var/cache/conftool/dbconfig/20230928-183141-arnaudb.json
  • 18:24 topranks: renaming cloud-hosts1-codfw vlan to cloud-hosts1-b1-codfw on cloudsw1-b1-codfw
  • 18:21 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:21 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52733 and previous config saved to /var/cache/conftool/dbconfig/20230928-181635-arnaudb.json
  • 18:09 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 17:51 brett: Imported acme-chief from Gerrit into Gitlab
  • 17:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T343198)', diff saved to https://phabricator.wikimedia.org/P52732 and previous config saved to /var/cache/conftool/dbconfig/20230928-174251-arnaudb.json
  • 17:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 17:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 17:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T343198)', diff saved to https://phabricator.wikimedia.org/P52731 and previous config saved to /var/cache/conftool/dbconfig/20230928-174230-arnaudb.json
  • 17:39 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:39 joal@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P52730 and previous config saved to /var/cache/conftool/dbconfig/20230928-172719-arnaudb.json
  • 17:14 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 17:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P52729 and previous config saved to /var/cache/conftool/dbconfig/20230928-171212-arnaudb.json
  • 16:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T343198)', diff saved to https://phabricator.wikimedia.org/P52728 and previous config saved to /var/cache/conftool/dbconfig/20230928-165706-arnaudb.json
  • 16:42 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 16:42 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 16:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
  • 16:42 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 7 hosts matching query A:cp-text_codfw and not P{cp2027*}
  • 16:42 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 7 hosts matching query A:cp-upload_codfw and not P{cp2028*}
  • 16:41 fabfur@cumin1001: END (ERROR) - Cookbook sre.cdn.roll-restart-varnish (exit_code=97) rolling restart of Varnish on 7 hosts matching query A:cp-upload_codfw and not P{cp2028*}
  • 16:41 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 7 hosts matching query A:cp-upload_codfw and not P{cp2028*}
  • 16:41 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 16:41 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 16:39 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 16:39 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 16:37 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
  • 16:35 fabfur@cumin1001: END (ERROR) - Cookbook sre.cdn.roll-restart-varnish (exit_code=97) rolling restart of Varnish on 8 hosts matching query A:cp-upload_codfw
  • 16:35 fabfur@cumin1001: END (ERROR) - Cookbook sre.cdn.roll-restart-varnish (exit_code=97) rolling restart of Varnish on 8 hosts matching query A:cp-text_codfw
  • 16:26 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:26 joal@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 16:23 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_codfw
  • 16:23 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_codfw
  • 16:14 hnowlan: enabling puppet on A:cp, routing mediarequests API via rest-gateway
  • 16:03 hnowlan: disabled puppet on A:cp
  • 15:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 15:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 15:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 15:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 15:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 15:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 15:48 brennen@deploy2002: Sync cancelled.
  • 15:48 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_ulsfo
  • 15:47 brennen@deploy2002: brennen: Backport for Revert "NostalgiaTemplate.php: Fix array illegal offset error" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:46 brennen@deploy2002: Started scap: Backport for Revert "NostalgiaTemplate.php: Fix array illegal offset error"
  • 15:39 brennen@deploy2002: Sync cancelled.
  • 15:38 brennen@deploy2002: krinkle and brennen: Backport for NostalgiaTemplate.php: Fix array illegal offset error synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:36 brennen@deploy2002: Started scap: Backport for NostalgiaTemplate.php: Fix array illegal offset error
  • 15:27 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 7 hosts matching query A:cp-upload_ulsfo and not P{cp4052*}
  • 15:06 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:05 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 14:49 inflatador: bking@wdqs1016 shutting down services to compress a 1.2 TB jnl file
  • 14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P52725 and previous config saved to /var/cache/conftool/dbconfig/20230928-144338-root.json
  • 14:35 moritzm: installing ghostscript security updates
  • 14:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
  • 14:32 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
  • 14:13 klausman: restarting pybal on lvs1019 and lvs2013 (LVS low-traffic actives) for T347278 (ORES turndown)
  • 14:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52723 and previous config saved to /var/cache/conftool/dbconfig/20230928-141140-arnaudb.json
  • 14:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52722 and previous config saved to /var/cache/conftool/dbconfig/20230928-141118-arnaudb.json
  • 14:08 cdanis: repooling cp5030 after haproxy upgrade & config deploy T317799
  • 14:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1228.eqiad.wmnet with OS bullseye
  • 14:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1228.eqiad.wmnet with OS bullseye
  • 14:02 cdanis: depooling cp5030 for haproxy upgrade & testing T317799
  • 14:01 moritzm: installing gsl security updates
  • 14:00 klausman: restarted pybal on lvs1020 and lvs2014 (LVS low-traffic backups) for T347278 (ORES turndown)
  • 13:57 taavi@deploy2002: Finished scap: Backport for Set WRITE_BOTH for CA wikis on OATHAuth multiple devices (T242031) (duration: 11m 02s)
  • 13:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P52720 and previous config saved to /var/cache/conftool/dbconfig/20230928-135612-arnaudb.json
  • 13:52 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:52 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:52 moritzm: installing flac security updates
  • 13:50 taavi@deploy2002: taavi: Continuing with sync
  • 13:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 13:47 taavi@deploy2002: taavi: Backport for Set WRITE_BOTH for CA wikis on OATHAuth multiple devices (T242031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:47 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:47 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:45 taavi@deploy2002: Started scap: Backport for Set WRITE_BOTH for CA wikis on OATHAuth multiple devices (T242031)
  • 13:43 urbanecm@deploy2002: Finished scap: Backport for Enable WikiLove on arwikisource (T346391) (duration: 11m 10s)
  • 13:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P52719 and previous config saved to /var/cache/conftool/dbconfig/20230928-134105-arnaudb.json
  • 13:37 urbanecm@deploy2002: zoranzoki21 and urbanecm: Continuing with sync
  • 13:33 urbanecm@deploy2002: zoranzoki21 and urbanecm: Backport for Enable WikiLove on arwikisource (T346391) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:31 urbanecm@deploy2002: Started scap: Backport for Enable WikiLove on arwikisource (T346391)
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot-master (exit_code=0) rolling reboot on A:maps-master-eqiad
  • 13:31 urbanecm@deploy2002: Finished scap: Backport for wikifunctionswiki: Disable NearbyPages (T345459) (duration: 11m 07s)
  • 13:28 urbanecm: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=arwikisource wikilove # T346391
  • 13:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52718 and previous config saved to /var/cache/conftool/dbconfig/20230928-132559-arnaudb.json
  • 13:25 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:25 urbanecm@deploy2002: ammarpad and urbanecm: Continuing with sync
  • 13:25 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:24 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot-master rolling reboot on A:maps-master-eqiad
  • 13:21 urbanecm@deploy2002: ammarpad and urbanecm: Backport for wikifunctionswiki: Disable NearbyPages (T345459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:20 urbanecm@deploy2002: Started scap: Backport for wikifunctionswiki: Disable NearbyPages (T345459)
  • 13:19 urbanecm@deploy2002: Finished scap: Backport for Enable Campaigns email on test wiki (T347065) (duration: 12m 31s)
  • 13:13 urbanecm@deploy2002: urbanecm and mhorsey: Continuing with sync
  • 13:08 urbanecm@deploy2002: urbanecm and mhorsey: Backport for Enable Campaigns email on test wiki (T347065) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:07 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 7 hosts matching query A:cp-upload_ulsfo and not P{cp4052*}
  • 13:07 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_ulsfo
  • 13:07 urbanecm@deploy2002: Started scap: Backport for Enable Campaigns email on test wiki (T347065)
  • 13:04 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 13:03 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 13:03 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:47 elukey: restart thanos-query on titan1002
  • 12:44 elukey: restart thanos-query on titan1001
  • 12:41 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot-master (exit_code=0) rolling reboot on A:maps-master-codfw
  • 12:31 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot-master rolling reboot on A:maps-master-codfw
  • 11:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T343198)', diff saved to https://phabricator.wikimedia.org/P52717 and previous config saved to /var/cache/conftool/dbconfig/20230928-115619-arnaudb.json
  • 11:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 11:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 11:30 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 11:26 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 11:09 fabfur: cp4037 back in pool (T347192)
  • 11:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 11:04 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 10:56 jmm@cumin2002: END (FAIL) - Cookbook sre.maps.roll-restart-reboot-master (exit_code=1) rolling reboot on A:maps-master-codfw
  • 10:54 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot-master rolling reboot on A:maps-master-codfw
  • 10:51 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 10:40 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 10:40 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 10:40 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 10:40 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 10:40 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 10:40 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 10:27 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 10:08 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
  • 09:58 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 09:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 09:54 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 09:52 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 09:52 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 09:51 _joe_: running puppet on cp-text to move mw on k8s to 10%
  • 09:48 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 09:45 fabfur: depool cp4037 to restart varnish and apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/960112 (T347192)
  • 09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52715 and previous config saved to /var/cache/conftool/dbconfig/20230928-092109-arnaudb.json
  • 09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 09:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T343198)', diff saved to https://phabricator.wikimedia.org/P52714 and previous config saved to /var/cache/conftool/dbconfig/20230928-092032-arnaudb.json
  • 09:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup1010.eqiad.wmnet
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
  • 09:11 arnaudb@cumin1001: START - Cookbook sre.hosts.reboot-single for host backup1010.eqiad.wmnet
  • 09:10 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 09:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 09:06 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 09:05 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2003-dev.codfw.wmnet
  • 09:05 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P52713 and previous config saved to /var/cache/conftool/dbconfig/20230928-090526-arnaudb.json
  • 09:04 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:04 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 09:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet with reason: rebooting backup1010
  • 09:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet with reason: rebooting backup1010
  • 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
  • 08:59 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 08:59 jayme@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P52712 and previous config saved to /var/cache/conftool/dbconfig/20230928-085019-arnaudb.json
  • 08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T343198)', diff saved to https://phabricator.wikimedia.org/P52711 and previous config saved to /var/cache/conftool/dbconfig/20230928-083513-arnaudb.json
  • 08:14 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 08:14 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 07:55 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1006.wikimedia.org
  • 07:55 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:55 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1006.wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 07:53 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1006.wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 07:51 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 07:48 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 07:47 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 07:46 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 07:44 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1006.wikimedia.org
  • 07:28 taavi: test
  • 07:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:25 _joe_: restarting trafficserver on cp1081 T347493
  • 04:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T343198)', diff saved to https://phabricator.wikimedia.org/P52710 and previous config saved to /var/cache/conftool/dbconfig/20230928-044238-arnaudb.json
  • 04:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 04:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 04:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T343198)', diff saved to https://phabricator.wikimedia.org/P52709 and previous config saved to /var/cache/conftool/dbconfig/20230928-044216-arnaudb.json
  • 04:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P52708 and previous config saved to /var/cache/conftool/dbconfig/20230928-042710-arnaudb.json
  • 04:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P52707 and previous config saved to /var/cache/conftool/dbconfig/20230928-041204-arnaudb.json
  • 03:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T343198)', diff saved to https://phabricator.wikimedia.org/P52706 and previous config saved to /var/cache/conftool/dbconfig/20230928-035657-arnaudb.json
  • 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1245.eqiad.wmnet with OS bullseye
  • 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1249.eqiad.wmnet with OS bullseye
  • 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1247.eqiad.wmnet with OS bullseye
  • 02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1248.eqiad.wmnet with OS bullseye
  • 02:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bullseye
  • 02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1244.eqiad.wmnet with OS bullseye
  • 02:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1242.eqiad.wmnet with OS bullseye
  • 02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1243.eqiad.wmnet with OS bullseye
  • 02:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1245.eqiad.wmnet with reason: host reimage
  • 02:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1249.eqiad.wmnet with reason: host reimage
  • 02:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1247.eqiad.wmnet with reason: host reimage
  • 02:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
  • 02:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1249.eqiad.wmnet with reason: host reimage
  • 02:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1248.eqiad.wmnet with reason: host reimage
  • 02:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1248.eqiad.wmnet with reason: host reimage
  • 02:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1247.eqiad.wmnet with reason: host reimage
  • 02:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1244.eqiad.wmnet with reason: host reimage
  • 02:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
  • 02:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1245.eqiad.wmnet with reason: host reimage
  • 02:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1242.eqiad.wmnet with reason: host reimage
  • 02:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1243.eqiad.wmnet with reason: host reimage
  • 02:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1244.eqiad.wmnet with reason: host reimage
  • 02:13 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1243.eqiad.wmnet with reason: host reimage
  • 02:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1242.eqiad.wmnet with reason: host reimage
  • 02:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1249.eqiad.wmnet with OS bullseye
  • 02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1248.eqiad.wmnet with OS bullseye
  • 02:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1247.eqiad.wmnet with OS bullseye
  • 02:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bullseye
  • 02:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1245.eqiad.wmnet with OS bullseye
  • 02:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1244.eqiad.wmnet with OS bullseye
  • 02:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1243.eqiad.wmnet with OS bullseye
  • 01:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1242.eqiad.wmnet with OS bullseye
  • 00:50 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol1008-dev']
  • 00:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1008-dev']
  • 00:50 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol1008-dev']
  • 00:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1008-dev']
  • 00:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol1008-dev']
  • 00:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1008-dev']
  • 00:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1008-dev']
  • 00:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1008-dev']
  • 00:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1008-dev']
  • 00:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1008-dev']
  • 00:37 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:16 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1008-dev
  • 00:05 eileen: civicrm upgraded from 41a4c2cf to 7406cdf3

2023-09-27

  • 23:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 23:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 23:54 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 23:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 23:36 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 23:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 23:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T343198)', diff saved to https://phabricator.wikimedia.org/P52705 and previous config saved to /var/cache/conftool/dbconfig/20230927-230117-arnaudb.json
  • 23:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 23:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 23:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T343198)', diff saved to https://phabricator.wikimedia.org/P52704 and previous config saved to /var/cache/conftool/dbconfig/20230927-230055-arnaudb.json
  • 22:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P52703 and previous config saved to /var/cache/conftool/dbconfig/20230927-224548-arnaudb.json
  • 22:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P52702 and previous config saved to /var/cache/conftool/dbconfig/20230927-223042-arnaudb.json
  • 22:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 22:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T343198)', diff saved to https://phabricator.wikimedia.org/P52701 and previous config saved to /var/cache/conftool/dbconfig/20230927-222505-arnaudb.json
  • 22:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir5002.eqsin.wmnet with OS bookworm
  • 22:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T343198)', diff saved to https://phabricator.wikimedia.org/P52700 and previous config saved to /var/cache/conftool/dbconfig/20230927-221536-arnaudb.json
  • 22:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P52699 and previous config saved to /var/cache/conftool/dbconfig/20230927-220959-arnaudb.json
  • 22:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 22:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 21:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P52698 and previous config saved to /var/cache/conftool/dbconfig/20230927-215452-arnaudb.json
  • 21:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5002.eqsin.wmnet with reason: host reimage
  • 21:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5002.eqsin.wmnet with reason: host reimage
  • 21:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T343198)', diff saved to https://phabricator.wikimedia.org/P52697 and previous config saved to /var/cache/conftool/dbconfig/20230927-213946-arnaudb.json
  • 21:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 21:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5002.eqsin.wmnet with OS bookworm
  • 20:59 cjming: end of UTC late backport window
  • 20:57 cjming@deploy2002: Finished scap: Backport for New projects default to Vector 2022 (T347444), Populate the legacy-vector dblist (T347444) (duration: 11m 05s)
  • 20:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir4001.ulsfo.wmnet with OS bookworm
  • 20:50 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
  • 20:47 cjming@deploy2002: jdlrobson and cjming: Backport for New projects default to Vector 2022 (T347444), Populate the legacy-vector dblist (T347444) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:46 cjming@deploy2002: Started scap: Backport for New projects default to Vector 2022 (T347444), Populate the legacy-vector dblist (T347444)
  • 20:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 20:44 cjming@deploy2002: Sync cancelled.
  • 20:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4001.ulsfo.wmnet with reason: host reimage
  • 20:35 cjming@deploy2002: cjming and jdlrobson: Backport for New projects default to Vector 2022 (T347444) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:34 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:34 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:33 cjming@deploy2002: Started scap: Backport for New projects default to Vector 2022 (T347444)
  • 20:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4001.ulsfo.wmnet with reason: host reimage
  • 20:31 cjming@deploy2002: Finished scap: Backport for Special wiki wordmarks and taglines (T341250), Add wordmark for li wikinews (T341258) (duration: 09m 52s)
  • 20:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on restbase2027.codfw.wmnet with reason: Repairing/rebuilding Cassandra instances
  • 20:27 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on restbase2027.codfw.wmnet with reason: Repairing/rebuilding Cassandra instances
  • 20:25 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
  • 20:23 brett: update haproxy 2.6 and 2.8 into bookworm archives with reprepro - T342154
  • 20:22 cjming@deploy2002: jdlrobson and cjming: Backport for Special wiki wordmarks and taglines (T341250), Add wordmark for li wikinews (T341258) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:21 cjming@deploy2002: Started scap: Backport for Special wiki wordmarks and taglines (T341250), Add wordmark for li wikinews (T341258)
  • 20:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir4001.ulsfo.wmnet with OS bookworm
  • 20:14 cjming@deploy2002: Finished scap: Backport for commonswiki: Add $wgExternalLinksDomainGaps for another domain (T341000) (duration: 10m 23s)
  • 20:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir4002.ulsfo.wmnet with OS bookworm
  • 20:08 cjming@deploy2002: lucaswerkmeister and cjming: Continuing with sync
  • 20:05 cjming@deploy2002: lucaswerkmeister and cjming: Backport for commonswiki: Add $wgExternalLinksDomainGaps for another domain (T341000) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:03 cjming@deploy2002: Started scap: Backport for commonswiki: Add $wgExternalLinksDomainGaps for another domain (T341000)
  • 19:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 19:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2027.codfw.wmnet with OS bullseye
  • 19:52 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@c6454a9]: update rdf tools jar to .131 (duration: 00m 28s)
  • 19:52 ebernhardson@deploy2002: Started deploy [airflow-dags/search@c6454a9]: update rdf tools jar to .131
  • 19:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4002.ulsfo.wmnet with reason: host reimage
  • 19:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4002.ulsfo.wmnet with reason: host reimage
  • 19:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['moss-be1003']
  • 19:39 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:39 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:38 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:38 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:35 inflatador: bking@deploy2002 deleting flink-operator leader pod to force failover T347521
  • 19:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be1003']
  • 19:26 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir4002.ulsfo.wmnet with OS bookworm
  • 19:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2027.codfw.wmnet with reason: host reimage
  • 19:22 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2027.codfw.wmnet with reason: host reimage
  • 19:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lists1004.eqiad.wmnet with OS bullseye
  • 19:18 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 19:14 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 19:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir6001.drmrs.wmnet with OS bookworm
  • 19:08 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:07 bking@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:07 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:06 bking@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:06 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1020.eqiad.wmnet
  • 19:05 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1020.eqiad.wmnet
  • 19:03 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2027.codfw.wmnet with OS bullseye
  • 19:01 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase2027.codfw.wmnet
  • 19:01 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
  • 18:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists1004.eqiad.wmnet with reason: host reimage
  • 18:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be1003.eqiad.wmnet with OS bullseye
  • 18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lists1004.eqiad.wmnet with reason: host reimage
  • 18:50 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
  • 18:50 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2027.codfw.wmnet
  • 18:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2026.codfw.wmnet with OS bullseye
  • 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir6001.drmrs.wmnet with reason: host reimage
  • 18:45 sukhe: re-enable puppet on O:apt_repo
  • 18:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir6001.drmrs.wmnet with reason: host reimage
  • 18:42 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
  • 18:41 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
  • 18:41 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
  • 18:39 sukhe: disable puppet on O:apt_repo
  • 18:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2026.codfw.wmnet with reason: host reimage
  • 18:24 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2026.codfw.wmnet with reason: host reimage
  • 18:24 dduvall@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.28 refs T345889 (duration: 06m 46s)
  • 18:20 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir6001.drmrs.wmnet with OS bookworm
  • 18:19 brett: re-enabling puppet on apt1001 from a quick test of CR 957766's effectiveness
  • 18:17 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.28 refs T345889
  • 18:15 brett: disabling puppet on apt1001 for a quick test of CR 957766's effectiveness
  • 18:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
  • 18:11 eevans@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts restbase2027.codfw.wmnet
  • 18:08 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS bullseye
  • 18:07 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2027.codfw.wmnet
  • 18:07 eevans@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts restbase2027.codfw.wmnet
  • 18:05 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase2026.codfw.wmnet
  • 18:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
  • 18:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
  • 18:01 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2027.codfw.wmnet
  • 17:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['stat1011']
  • 17:53 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:53 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:53 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:53 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:53 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
  • 17:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1011']
  • 17:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['stat1011']
  • 17:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1011']
  • 17:52 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2026.codfw.wmnet
  • 17:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host stat1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host stat1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir6002.drmrs.wmnet with OS bookworm
  • 17:39 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:39 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove host frauth2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 17:39 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove host frauth2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
  • 17:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1019.eqiad.wmnet
  • 17:38 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1019.eqiad.wmnet
  • 17:36 jgreen@cumin1001: START - Cookbook sre.dns.netbox
  • 17:36 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
  • 17:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir6002.drmrs.wmnet with reason: host reimage
  • 17:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1019.eqiad.wmnet with OS bullseye
  • 17:23 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir6002.drmrs.wmnet with reason: host reimage
  • 17:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T343198)', diff saved to https://phabricator.wikimedia.org/P52696 and previous config saved to /var/cache/conftool/dbconfig/20230927-171014-arnaudb.json
  • 17:10 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 17:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 17:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T343198)', diff saved to https://phabricator.wikimedia.org/P52695 and previous config saved to /var/cache/conftool/dbconfig/20230927-170953-arnaudb.json
  • 17:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir6002.drmrs.wmnet with OS bookworm
  • 16:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1019.eqiad.wmnet with reason: host reimage
  • 16:55 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1019.eqiad.wmnet with reason: host reimage
  • 16:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P52694 and previous config saved to /var/cache/conftool/dbconfig/20230927-165446-arnaudb.json
  • 16:52 dduvall@deploy2002: Finished scap: (no justification provided) (duration: 28m 15s)
  • 16:42 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1019.eqiad.wmnet with OS bullseye
  • 16:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P52693 and previous config saved to /var/cache/conftool/dbconfig/20230927-163940-arnaudb.json
  • 16:39 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1019.eqiad.wmnet']
  • 16:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1019.eqiad.wmnet']
  • 16:31 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1019.eqiad.wmnet with OS bullseye
  • 16:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2023.codfw.wmnet
  • 16:29 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2023.codfw.wmnet
  • 16:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2023.codfw.wmnet with OS bullseye
  • 16:24 dduvall@deploy2002: Started scap: (no justification provided)
  • 16:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T343198)', diff saved to https://phabricator.wikimedia.org/P52692 and previous config saved to /var/cache/conftool/dbconfig/20230927-162433-arnaudb.json
  • 16:16 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1019.eqiad.wmnet with OS bullseye
  • 16:09 kamila_: Pooled back eqiad for traffic after the DC switchover (T345263)
  • 16:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2023.codfw.wmnet with reason: host reimage
  • 16:02 reedy@deploy2002: Finished scap: (no justification provided) (duration: 07m 22s)
  • 16:00 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2023.codfw.wmnet with reason: host reimage
  • 15:55 reedy@deploy2002: Started scap: (no justification provided)
  • 15:54 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 15:54 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 15:53 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 15:53 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 15:53 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 15:53 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
  • 15:51 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 15:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet
  • 15:51 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1022.eqiad.wmnet
  • 15:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1022.eqiad.wmnet
  • 15:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1019.eqiad.wmnet
  • 15:50 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1019.eqiad.wmnet
  • 15:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1018.eqiad.wmnet
  • 15:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1018.eqiad.wmnet
  • 15:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1017.eqiad.wmnet
  • 15:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1017.eqiad.wmnet
  • 15:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1016.eqiad.wmnet
  • 15:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1016.eqiad.wmnet
  • 15:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2023.codfw.wmnet with OS bullseye
  • 15:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2018.codfw.wmnet
  • 15:41 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2018.codfw.wmnet
  • 15:30 dancy@deploy2002: Installation of scap version "4.63.0" completed for 598 hosts
  • 15:29 dancy@deploy2002: Installing scap version "4.63.0" for 598 hosts
  • 15:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2018.codfw.wmnet with OS bullseye
  • 15:24 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 15:23 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcumin1001.eqiad.wmnet with OS bullseye
  • 15:09 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcumin1001.eqiad.wmnet with reason: host reimage
  • 15:09 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ntp.anycast.wmnet on all recursors
  • 15:09 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache ntp.anycast.wmnet on all recursors
  • 15:09 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:09 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:07 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp.anycast.wmnet - sukhe@cumin2002"
  • 15:07 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp.anycast.wmnet - sukhe@cumin2002"
  • 15:06 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcumin1001.eqiad.wmnet with reason: host reimage
  • 15:04 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2018.codfw.wmnet with reason: host reimage
  • 15:02 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:01 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:01 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:00 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:59 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:59 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:59 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:58 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2018.codfw.wmnet with reason: host reimage
  • 14:58 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcumin1001.eqiad.wmnet with OS bullseye
  • 14:57 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 14:56 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@49e3804]: Deploy latest Airflow DAGs to analytics instance (duration: 00m 42s)
  • 14:55 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@49e3804]: Deploy latest Airflow DAGs to analytics instance
  • 14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
  • 14:40 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2018.codfw.wmnet with OS bullseye
  • 14:38 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2018.codfw.wmnet']
  • 14:31 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2018.codfw.wmnet']
  • 14:30 moritzm: Added Arnaud to pwstore and removed Jeff (frtech SREs no longer need/use it)
  • 14:29 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all services in eqiad: Datacenter Switchover: eqiad repool - T345263
  • 14:22 claime: Repooling eqiad services in progress - T345263
  • 14:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
  • 14:13 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for 15 hosts
  • 14:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2017.codfw.wmnet
  • 14:10 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2017.codfw.wmnet
  • 14:08 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcumin2001.codfw.wmnet with OS bullseye
  • 14:08 kamila@cumin1001: START - Cookbook sre.discovery.datacenter pool all services in eqiad: Datacenter Switchover: eqiad repool - T345263
  • 14:06 _joe_: updating conftool everywhere
  • 14:00 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2017.codfw.wmnet with OS bullseye
  • 13:54 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcumin2001.codfw.wmnet with reason: host reimage
  • 13:51 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcumin2001.codfw.wmnet with reason: host reimage
  • 13:50 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:49 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Add label for Wikifunctions in “other projects” sidebar section (T342857) (duration: 29m 56s)
  • 13:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
  • 13:44 aqu: Deployed refinery using scap, then deployed onto hdfs
  • 13:43 aqu@deploy2002: Finished deploy [analytics/refinery@223be0f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@223be0fb] (duration: 08m 33s)
  • 13:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 13:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Add label for Wikifunctions in “other projects” sidebar section (T342857) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:36 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcumin2001.codfw.wmnet with OS bullseye
  • 13:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2017.codfw.wmnet with reason: host reimage
  • 13:35 aqu@deploy2002: Started deploy [analytics/refinery@223be0f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@223be0fb]
  • 13:33 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2017.codfw.wmnet with reason: host reimage
  • 13:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:26 aqu@deploy2002: deploy aborted: Regular analytics weekly train TEST [analytics/refinery@223be0fb] (duration: 00m 16s)
  • 13:26 aqu@deploy2002: Started deploy [analytics/refinery@223be0f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@223be0fb]
  • 13:26 aqu@deploy2002: Finished deploy [analytics/refinery@223be0f] (thin): Regular analytics weekly train THIN [analytics/refinery@223be0fb] (duration: 00m 10s)
  • 13:26 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
  • 13:26 aqu@deploy2002: Started deploy [analytics/refinery@223be0f] (thin): Regular analytics weekly train THIN [analytics/refinery@223be0fb]
  • 13:25 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
  • 13:24 aqu@deploy2002: Finished deploy [analytics/refinery@223be0f]: Regular analytics weekly train [analytics/refinery@223be0fb] (duration: 06m 58s)
  • 13:21 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 13:21 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 13:21 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 13:21 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 13:20 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:19 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Add label for Wikifunctions in “other projects” sidebar section (T342857)
  • 13:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:17 aqu@deploy2002: Started deploy [analytics/refinery@223be0f]: Regular analytics weekly train [analytics/refinery@223be0fb]
  • 13:17 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS bullseye
  • 13:12 aqu: Deployment weekly train of analytics-refinery (+new source version)
  • 12:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: Still running on 9 mirrormaker processes from main-eqiad to jumbo
  • 12:18 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: Still running on 9 mirrormaker processes from main-eqiad to jumbo
  • 11:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T343198)', diff saved to https://phabricator.wikimedia.org/P52688 and previous config saved to /var/cache/conftool/dbconfig/20230927-112640-arnaudb.json
  • 11:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 11:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T343198)', diff saved to https://phabricator.wikimedia.org/P52687 and previous config saved to /var/cache/conftool/dbconfig/20230927-112342-arnaudb.json
  • 11:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 11:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52686 and previous config saved to /var/cache/conftool/dbconfig/20230927-112320-arnaudb.json
  • 11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P52685 and previous config saved to /var/cache/conftool/dbconfig/20230927-110813-arnaudb.json
  • 10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P52684 and previous config saved to /var/cache/conftool/dbconfig/20230927-105306-arnaudb.json
  • 10:46 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:46 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:45 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:45 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:40 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:39 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:39 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52683 and previous config saved to /var/cache/conftool/dbconfig/20230927-103800-arnaudb.json
  • 10:27 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:27 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:27 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:27 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:48 jayme@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1013.*
  • 09:43 claime: Bumping mw-on-k8s traffic to 8% - T346422
  • 09:36 jayme: cordoning kubernetes1013 for debug porposes
  • 09:33 taavi: update CR firewall policy, gerrit 961336
  • 09:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
  • 09:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
  • 09:10 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:10 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:08 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:08 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo
  • 09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo
  • 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
  • 08:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
  • 08:44 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 08:44 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 08:44 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 08:44 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo
  • 08:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo
  • 08:21 vgutierrez: update HAProxy to version 2.7.10 in cp4051 - T317799
  • 08:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 15 hosts with reason: Kafka mirror issues on jumbo
  • 08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 15 hosts with reason: Kafka mirror issues on jumbo
  • 07:39 Emperor: repool ms-fe2009
  • 06:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
  • 06:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
  • 06:50 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 06:50 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 06:50 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 06:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 06:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 05:54 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 05:53 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 05:53 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 04:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 04:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 02:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1241.eqiad.wmnet with OS bullseye
  • 02:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1240.eqiad.wmnet with OS bullseye
  • 02:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1239.eqiad.wmnet with OS bullseye
  • 02:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS bullseye
  • 02:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1238.eqiad.wmnet with OS bullseye
  • 02:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1237.eqiad.wmnet with OS bullseye
  • 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1235.eqiad.wmnet with OS bullseye
  • 02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1234.eqiad.wmnet with OS bullseye
  • 02:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1240.eqiad.wmnet with reason: host reimage
  • 02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1241.eqiad.wmnet with reason: host reimage
  • 02:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1239.eqiad.wmnet with reason: host reimage
  • 02:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage
  • 02:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1238.eqiad.wmnet with reason: host reimage
  • 02:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1237.eqiad.wmnet with reason: host reimage
  • 02:24 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1241.eqiad.wmnet with reason: host reimage
  • 02:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1235.eqiad.wmnet with reason: host reimage
  • 02:23 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1240.eqiad.wmnet with reason: host reimage
  • 02:22 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1239.eqiad.wmnet with reason: host reimage
  • 02:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1238.eqiad.wmnet with reason: host reimage
  • 02:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1234.eqiad.wmnet with reason: host reimage
  • 02:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1237.eqiad.wmnet with reason: host reimage
  • 02:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage
  • 02:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1235.eqiad.wmnet with reason: host reimage
  • 02:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1234.eqiad.wmnet with reason: host reimage
  • 02:11 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1241.eqiad.wmnet with OS bullseye
  • 02:11 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2025.codfw.wmnet
  • 02:11 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2025.codfw.wmnet
  • 02:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1240.eqiad.wmnet with OS bullseye
  • 02:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS bullseye
  • 02:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1239.eqiad.wmnet with OS bullseye
  • 02:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1238.eqiad.wmnet with OS bullseye
  • 02:08 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1237.eqiad.wmnet with OS bullseye
  • 02:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS bullseye
  • 02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1235.eqiad.wmnet with OS bullseye
  • 02:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1234.eqiad.wmnet with OS bullseye
  • 02:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 02:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 02:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P52682 and previous config saved to /var/cache/conftool/dbconfig/20230927-020034-arnaudb.json
  • 01:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2025.codfw.wmnet with reason: host reimage
  • 01:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P52681 and previous config saved to /var/cache/conftool/dbconfig/20230927-014527-arnaudb.json
  • 01:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2025.codfw.wmnet with reason: host reimage
  • 01:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P52680 and previous config saved to /var/cache/conftool/dbconfig/20230927-013020-arnaudb.json
  • 01:26 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS bullseye
  • 01:25 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2022.codfw.wmnet
  • 01:25 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2022.codfw.wmnet
  • 01:25 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2022.codfw.wmnet with OS bullseye
  • 01:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P52679 and previous config saved to /var/cache/conftool/dbconfig/20230927-011514-arnaudb.json
  • 01:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2022.codfw.wmnet with reason: host reimage
  • 00:59 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2022.codfw.wmnet with reason: host reimage
  • 00:42 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2022.codfw.wmnet with OS bullseye
  • 00:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P52678 and previous config saved to /var/cache/conftool/dbconfig/20230927-004144-arnaudb.json
  • 00:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 00:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 00:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52677 and previous config saved to /var/cache/conftool/dbconfig/20230927-004122-arnaudb.json
  • 00:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2020.codfw.wmnet with OS bullseye
  • 00:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P52676 and previous config saved to /var/cache/conftool/dbconfig/20230927-002616-arnaudb.json
  • 00:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P52675 and previous config saved to /var/cache/conftool/dbconfig/20230927-001109-arnaudb.json

2023-09-26

  • 23:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52674 and previous config saved to /var/cache/conftool/dbconfig/20230926-235602-arnaudb.json
  • 23:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2020.codfw.wmnet with reason: host reimage
  • 23:52 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2020.codfw.wmnet with reason: host reimage
  • 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52673 and previous config saved to /var/cache/conftool/dbconfig/20230926-235026-arnaudb.json
  • 23:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 23:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52672 and previous config saved to /var/cache/conftool/dbconfig/20230926-235005-arnaudb.json
  • 23:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
  • 23:41 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase2022.codfw.wmnet
  • 23:41 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2022.codfw.wmnet
  • 23:41 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase2022.codfw.wmnet
  • 23:41 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2022.codfw.wmnet
  • 23:36 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2020.codfw.wmnet with OS bullseye
  • 23:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P52671 and previous config saved to /var/cache/conftool/dbconfig/20230926-233458-arnaudb.json
  • 23:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P52670 and previous config saved to /var/cache/conftool/dbconfig/20230926-231951-arnaudb.json
  • 23:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52669 and previous config saved to /var/cache/conftool/dbconfig/20230926-230445-arnaudb.json
  • 22:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
  • 22:47 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2020.codfw.wmnet']
  • 22:47 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2020.codfw.wmnet']
  • 22:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:36 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2016.codfw.wmnet with OS bullseye
  • 22:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004']
  • 22:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004']
  • 22:26 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004']
  • 22:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004']
  • 22:25 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004']
  • 22:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004']
  • 22:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
  • 22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52668 and previous config saved to /var/cache/conftool/dbconfig/20230926-220812-arnaudb.json
  • 22:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 22:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P52667 and previous config saved to /var/cache/conftool/dbconfig/20230926-220801-arnaudb.json
  • 21:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2016.codfw.wmnet with reason: host reimage
  • 21:56 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2016.codfw.wmnet with reason: host reimage
  • 21:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P52666 and previous config saved to /var/cache/conftool/dbconfig/20230926-215254-arnaudb.json
  • 21:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS bullseye
  • 21:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P52665 and previous config saved to /var/cache/conftool/dbconfig/20230926-213747-arnaudb.json
  • 21:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
  • 21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P52664 and previous config saved to /var/cache/conftool/dbconfig/20230926-212240-arnaudb.json
  • 21:22 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:13 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:13 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:13 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:09 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:08 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 21:00 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 20:59 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 20:59 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 20:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 20:50 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 20:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 20:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
  • 20:48 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:48 taavi@deploy2002: Finished scap: Backport for Set WRITE_NEW for Wikitech on OATHAuth multiple devices migration (T242031) (duration: 07m 38s)
  • 20:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P52663 and previous config saved to /var/cache/conftool/dbconfig/20230926-204331-arnaudb.json
  • 20:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 20:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 20:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P52662 and previous config saved to /var/cache/conftool/dbconfig/20230926-204309-arnaudb.json
  • 20:42 taavi@deploy2002: taavi: Continuing with sync
  • 20:42 taavi@deploy2002: taavi: Backport for Set WRITE_NEW for Wikitech on OATHAuth multiple devices migration (T242031) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:42 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2016.codfw.wmnet with OS bullseye
  • 20:40 taavi@deploy2002: Started scap: Backport for Set WRITE_NEW for Wikitech on OATHAuth multiple devices migration (T242031)
  • 20:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
  • 20:38 taavi@deploy2002: Finished scap: Backport for Do not set $wgPasswordResetRoutes['domain'] (T345226), Do not set $wgPasswordResetRoutes['domain'] (T345226) (duration: 08m 35s)
  • 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1016.eqiad.wmnet with OS bullseye
  • 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:31 taavi@deploy2002: taavi: Continuing with sync
  • 20:31 taavi@deploy2002: taavi: Backport for Do not set $wgPasswordResetRoutes['domain'] (T345226), Do not set $wgPasswordResetRoutes['domain'] (T345226) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:29 taavi@deploy2002: Started scap: Backport for Do not set $wgPasswordResetRoutes['domain'] (T345226), Do not set $wgPasswordResetRoutes['domain'] (T345226)
  • 20:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P52661 and previous config saved to /var/cache/conftool/dbconfig/20230926-202803-arnaudb.json
  • 20:26 taavi@deploy2002: Finished scap: Backport for Add $wgExternalLinksDomainGaps (T341000) (duration: 09m 44s)
  • 20:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:19 taavi@deploy2002: taavi and lucaswerkmeister: Continuing with sync
  • 20:18 taavi@deploy2002: taavi and lucaswerkmeister: Backport for Add $wgExternalLinksDomainGaps (T341000) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:17 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1015.eqiad.wmnet with OS bullseye
  • 20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:16 taavi@deploy2002: Started scap: Backport for Add $wgExternalLinksDomainGaps (T341000)
  • 20:16 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:16 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:15 taavi@deploy2002: Finished scap: Backport for Wordmarks for Wikinews projects (T341258), Update wikiquote wordmarks (T341260), Update README clarifying the use of local images. (duration: 10m 04s)
  • 20:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:15 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 20:14 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS bullseye
  • 20:14 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2016.codfw.wmnet with OS bullseye
  • 20:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P52660 and previous config saved to /var/cache/conftool/dbconfig/20230926-201256-arnaudb.json
  • 20:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1016.eqiad.wmnet with reason: host reimage
  • 20:09 taavi@deploy2002: taavi and jdlrobson: Continuing with sync
  • 20:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1016.eqiad.wmnet with reason: host reimage
  • 20:06 taavi@deploy2002: taavi and jdlrobson: Backport for Wordmarks for Wikinews projects (T341258), Update wikiquote wordmarks (T341260), Update README clarifying the use of local images. synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-exp
  • 20:06 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 20:06 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 20:05 taavi@deploy2002: Started scap: Backport for Wordmarks for Wikinews projects (T341258), Update wikiquote wordmarks (T341260), Update README clarifying the use of local images.
  • 20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:04 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 20:04 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 20:02 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 20:02 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 20:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1015.eqiad.wmnet with reason: host reimage
  • 20:02 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:01 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:01 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:00 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:00 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 20:00 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:59 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:59 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1015.eqiad.wmnet with reason: host reimage
  • 19:57 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P52659 and previous config saved to /var/cache/conftool/dbconfig/20230926-195750-arnaudb.json
  • 19:57 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:55 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:54 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
  • 19:53 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:52 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:48 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS bullseye
  • 19:47 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2015.codfw.wmnet
  • 19:47 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2015.codfw.wmnet
  • 19:46 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 19:46 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:46 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 19:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2015.codfw.wmnet with OS bullseye
  • 19:45 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:42 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 19:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1023.eqiad.wmnet with OS bullseye
  • 19:40 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 19:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 19:37 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 19:37 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 19:33 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 19:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc1015.eqiad.wmnet with OS bullseye
  • 19:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc1016.eqiad.wmnet with OS bullseye
  • 19:33 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 19:32 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 19:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 19:31 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 19:30 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 19:27 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:27 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1023.eqiad.wmnet with reason: host reimage
  • 19:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P52657 and previous config saved to /var/cache/conftool/dbconfig/20230926-191904-arnaudb.json
  • 19:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 19:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 19:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52656 and previous config saved to /var/cache/conftool/dbconfig/20230926-191843-arnaudb.json
  • 19:18 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1023.eqiad.wmnet with reason: host reimage
  • 19:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:16 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2015.codfw.wmnet with reason: host reimage
  • 19:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P52655 and previous config saved to /var/cache/conftool/dbconfig/20230926-190336-arnaudb.json
  • 19:02 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 19:02 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 19:02 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2015.codfw.wmnet with reason: host reimage
  • 18:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
  • 18:58 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 18:54 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 18:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1020.eqiad.wmnet with OS bullseye
  • 18:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:48 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 18:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P52654 and previous config saved to /var/cache/conftool/dbconfig/20230926-184830-arnaudb.json
  • 18:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:47 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2015.codfw.wmnet with OS bullseye
  • 18:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2024.codfw.wmnet
  • 18:46 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2024.codfw.wmnet
  • 18:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1017.eqiad.wmnet with OS bullseye
  • 18:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:45 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:41 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:40 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52653 and previous config saved to /var/cache/conftool/dbconfig/20230926-183323-arnaudb.json
  • 18:32 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
  • 18:30 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.28 refs T345889
  • 18:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1017.eqiad.wmnet with reason: host reimage
  • 18:28 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
  • 18:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1017.eqiad.wmnet with reason: host reimage
  • 18:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
  • 18:18 brennen: train 1.41.0-wmf.28 (T345889): no current blockers, rolling to group0
  • 18:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS bullseye
  • 18:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
  • 18:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1020']
  • 18:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 18:03 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1017']
  • 18:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020']
  • 18:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017']
  • 18:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1021.eqiad.wmnet with OS bullseye
  • 18:01 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 17:58 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:58 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:52 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@94ac23e]: tune parallelism of process_sparql_query_hourly (duration: 00m 27s)
  • 17:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52652 and previous config saved to /var/cache/conftool/dbconfig/20230926-175222-arnaudb.json
  • 17:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 17:52 ebernhardson@deploy2002: Started deploy [airflow-dags/search@94ac23e]: tune parallelism of process_sparql_query_hourly
  • 17:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 17:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P52651 and previous config saved to /var/cache/conftool/dbconfig/20230926-175201-arnaudb.json
  • 17:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2024.codfw.wmnet with reason: host reimage
  • 17:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2024.codfw.wmnet with reason: host reimage
  • 17:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P52650 and previous config saved to /var/cache/conftool/dbconfig/20230926-173653-arnaudb.json
  • 17:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS bullseye
  • 17:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P52649 and previous config saved to /var/cache/conftool/dbconfig/20230926-172146-arnaudb.json
  • 17:15 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pyrra.svc records - herron@cumin1001"
  • 17:14 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pyrra.svc records - herron@cumin1001"
  • 17:12 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P52648 and previous config saved to /var/cache/conftool/dbconfig/20230926-170639-arnaudb.json
  • 17:01 bblack: A:swift-fe-codfw: manually rolling systemctl restart of swift-proxy and nginx
  • 16:59 bblack@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 16:53 bblack@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 16:52 bblack: ms-fe2009 - restart swift_dispersion_stats + swift_dispersion_stats_lowlatency services (failing in systemctl)
  • 16:51 bblack@cumin1001: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=1) rolling restart_daemons on A:swift-fe-codfw
  • 16:45 bblack@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 16:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
  • 16:28 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:27 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
  • 16:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P52647 and previous config saved to /var/cache/conftool/dbconfig/20230926-162609-arnaudb.json
  • 16:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 16:25 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 16:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52646 and previous config saved to /var/cache/conftool/dbconfig/20230926-162547-arnaudb.json
  • 16:23 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 16:23 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 16:17 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 16:17 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 16:15 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 16:15 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 16:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1021.eqiad.wmnet with reason: host reimage
  • 16:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P52645 and previous config saved to /var/cache/conftool/dbconfig/20230926-161041-arnaudb.json
  • 16:09 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1021.eqiad.wmnet with reason: host reimage
  • 16:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1021.eqiad.wmnet with OS bullseye
  • 15:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
  • 15:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P52644 and previous config saved to /var/cache/conftool/dbconfig/20230926-155534-arnaudb.json
  • 15:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
  • 15:47 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1021']
  • 15:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021']
  • 15:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1021']
  • 15:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021']
  • 15:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52643 and previous config saved to /var/cache/conftool/dbconfig/20230926-154027-arnaudb.json
  • 15:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
  • 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
  • 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
  • 15:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:25 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:24 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
  • 15:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2021.codfw.wmnet
  • 15:24 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2021.codfw.wmnet
  • 15:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:11 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:11 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:09 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@d895dde]: deploy to phab1004 for weekly updates (duration: 00m 44s)
  • 15:06 brennen@deploy2002: Started deploy [phabricator/deployment@d895dde]: deploy to phab1004 for weekly updates
  • 15:06 brennen@deploy2002: Finished deploy [phabricator/deployment@d895dde]: test deploy to phab2002 (duration: 00m 35s)
  • 15:05 brennen@deploy2002: Started deploy [phabricator/deployment@d895dde]: test deploy to phab2002
  • 15:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:04 ejegg: re-enabled recurring donations charge job
  • 15:03 brennen: beginning routine phabricator update shortly
  • 15:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
  • 15:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
  • 15:02 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:01 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt 20 - jclark@cumin1001"
  • 15:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52642 and previous config saved to /var/cache/conftool/dbconfig/20230926-150056-arnaudb.json
  • 15:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 15:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 15:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T343198)', diff saved to https://phabricator.wikimedia.org/P52641 and previous config saved to /var/cache/conftool/dbconfig/20230926-150028-arnaudb.json
  • 15:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt 20 - jclark@cumin1001"
  • 14:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
  • 14:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
  • 14:57 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:53 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:53 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
  • 14:52 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
  • 14:50 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:47 moritzm: installing lldpd security updates
  • 14:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2021.codfw.wmnet with OS bullseye
  • 14:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P52640 and previous config saved to /var/cache/conftool/dbconfig/20230926-144521-arnaudb.json
  • 14:38 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
  • 14:38 effie: Rump up traffic to mw-on-k8s to 6.5% - T346422
  • 14:37 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
  • 14:36 ejegg: fundraising civicrm upgraded from 9efea665 to 41a4c2cf
  • 14:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:34 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "puppetserver2002.codfw.wmnet - jbond@cumin2002"
  • 14:33 ejegg: disabled recurring donations charge job for civi deploy
  • 14:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P52639 and previous config saved to /var/cache/conftool/dbconfig/20230926-143015-arnaudb.json
  • 14:27 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "puppetserver2002.codfw.wmnet - jbond@cumin2002"
  • 14:25 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2002.codfw.wmnet with OS bookworm
  • 14:25 jbond@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
  • 14:24 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
  • 14:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver1003.eqiad.wmnet with OS bookworm
  • 14:23 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
  • 14:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
  • 14:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
  • 14:17 moritzm: prune obsolete nginx packages from durum hosts after migration to new library scheme T329529
  • 14:16 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
  • 14:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T343198)', diff saved to https://phabricator.wikimedia.org/P52638 and previous config saved to /var/cache/conftool/dbconfig/20230926-141508-arnaudb.json
  • 14:13 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2002.codfw.wmnet with reason: host reimage
  • 14:10 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2002.codfw.wmnet with reason: host reimage
  • 14:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 14:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
  • 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1003.eqiad.wmnet with reason: host reimage
  • 14:02 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:02 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enable Minerva site notice for wikifunctions wiki (T345463) (duration: 09m 51s)
  • 14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1003.eqiad.wmnet with reason: host reimage
  • 14:01 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2021.codfw.wmnet with reason: host reimage
  • 13:57 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2021.codfw.wmnet with reason: host reimage
  • 13:55 lucaswerkmeister-wmde@deploy2002: ammarpad and lucaswerkmeister-wmde: Continuing with sync
  • 13:54 lucaswerkmeister-wmde@deploy2002: ammarpad and lucaswerkmeister-wmde: Backport for Enable Minerva site notice for wikifunctions wiki (T345463) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:52 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enable Minerva site notice for wikifunctions wiki (T345463)
  • 13:51 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for arwikisource: Increase autoconfirm edit count to 10 (T347264) (duration: 11m 27s)
  • 13:47 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 ammarpad and lucaswerkmeister-wmde: Continuing with sync [originally 13:44 UTC]
  • 13:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2021.codfw.wmnet with OS bullseye
  • 13:43 lucaswerkmeister-wmde@deploy2002: ammarpad and lucaswerkmeister-wmde: Backport for arwikisource: Increase autoconfirm edit count to 10 (T347264) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:43 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2019.codfw.wmnet
  • 13:43 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2019.codfw.wmnet
  • 13:39 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for arwikisource: Increase autoconfirm edit count to 10 (T347264)
  • 13:37 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for add search update pipeline streams (update + fetch_error) (T317609) (duration: 11m 54s)
  • 13:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 13:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 13:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 13:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
  • 13:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 13:34 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
  • 13:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2019.codfw.wmnet with OS bullseye
  • 13:31 lucaswerkmeister-wmde@deploy2002: pfischer and lucaswerkmeister-wmde: Continuing with sync
  • 13:29 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1003.eqiad.wmnet with OS bookworm
  • 13:27 lucaswerkmeister-wmde@deploy2002: pfischer and lucaswerkmeister-wmde: Backport for add search update pipeline streams (update + fetch_error) (T317609) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:25 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for add search update pipeline streams (update + fetch_error) (T317609)
  • 13:25 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2002.codfw.wmnet with OS bookworm
  • 13:25 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver2002.codfw.wmnet on all recursors
  • 13:25 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetserver2002.codfw.wmnet on all recursors
  • 13:25 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:25 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
  • 13:24 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
  • 13:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T343198)', diff saved to https://phabricator.wikimedia.org/P52637 and previous config saved to /var/cache/conftool/dbconfig/20230926-132357-arnaudb.json
  • 13:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 13:22 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 13:21 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Make wikifunctionswiki a multilingual Wikidata client (T342857) (duration: 09m 44s)
  • 13:18 aokoth@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
  • 13:15 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
  • 13:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 13:14 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
  • 13:13 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Make wikifunctionswiki a multilingual Wikidata client (T342857) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:11 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Make wikifunctionswiki a multilingual Wikidata client (T342857)
  • 13:07 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
  • 13:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2019.codfw.wmnet with reason: host reimage
  • 13:06 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
  • 13:04 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:04 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2019.codfw.wmnet with reason: host reimage
  • 13:04 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:02 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 13:02 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:01 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:01 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
  • 13:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
  • 13:00 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.failover (exit_code=93) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
  • 13:00 jbond@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
  • 12:57 jbond@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
  • 12:55 jbond@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver1003
  • 12:54 aokoth@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
  • 12:53 jbond@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1003
  • 12:53 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
  • 12:53 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
  • 12:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:52 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
  • 12:52 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
  • 12:52 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
  • 12:52 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
  • 12:49 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:48 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS bullseye
  • 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 12:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 12:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb2002.codfw.wmnet
  • 12:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster1004.eqiad.wmnet
  • 12:16 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:16 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 12:15 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 12:12 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb2002.codfw.wmnet
  • 12:12 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:10 taavi: deploy https://gerrit.wikimedia.org/r/961054 via homer
  • 12:10 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts puppetmaster2004.codfw.wmnet
  • 12:10 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:08 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 12:05 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetmaster1004.eqiad.wmnet
  • 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52635 and previous config saved to /var/cache/conftool/dbconfig/20230926-120417-arnaudb.json
  • 12:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T343198)', diff saved to https://phabricator.wikimedia.org/P52634 and previous config saved to /var/cache/conftool/dbconfig/20230926-120355-arnaudb.json
  • 12:00 jbond@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster2004.codfw.wmnet
  • 11:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P52633 and previous config saved to /var/cache/conftool/dbconfig/20230926-114848-arnaudb.json
  • 11:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P52632 and previous config saved to /var/cache/conftool/dbconfig/20230926-113340-arnaudb.json
  • 11:29 taavi@deploy2002: Finished scap: Backport for wikitech: $wgPasswordResetRoutes takes an empty array, not false (duration: 07m 28s)
  • 11:23 taavi@deploy2002: taavi: Continuing with sync
  • 11:23 taavi@deploy2002: taavi: Backport for wikitech: $wgPasswordResetRoutes takes an empty array, not false synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:21 taavi@deploy2002: Started scap: Backport for wikitech: $wgPasswordResetRoutes takes an empty array, not false
  • 11:18 taavi@deploy2002: Finished scap: Backport for wikitech: Properly disable password resets (T345226) (duration: 08m 00s)
  • 11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T343198)', diff saved to https://phabricator.wikimedia.org/P52631 and previous config saved to /var/cache/conftool/dbconfig/20230926-111834-arnaudb.json
  • 11:12 taavi@deploy2002: taavi: Continuing with sync
  • 11:12 taavi@deploy2002: taavi: Backport for wikitech: Properly disable password resets (T345226) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:10 taavi@deploy2002: Started scap: Backport for wikitech: Properly disable password resets (T345226)
  • 11:07 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 11:07 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 10:55 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 10:55 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 10:54 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 10:53 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 10:51 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:51 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:46 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:46 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:46 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:46 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:41 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:41 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:40 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:40 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:39 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:38 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:38 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:38 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:37 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:37 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Downtiming host for RAID controller battery replacement
  • 10:37 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Downtiming host for RAID controller battery replacement
  • 10:36 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:35 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:05 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 10:05 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 10:04 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 10:04 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 10:04 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 10:03 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 10:03 taavi: update CR firewall policy to permit wiki replica account creation in the new cloud-private network setup, https://gerrit.wikimedia.org/r/961055 T347381
  • 10:03 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:02 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 10:01 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:00 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 10:00 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:00 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 09:54 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 09:53 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 09:52 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 09:52 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 09:48 godog: remove per-host restbase healthchecks, replaced by service-level swagger-exporter checks - T314118
  • 09:47 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 09:47 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 09:38 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 09:38 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 09:37 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 09:36 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 09:36 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 09:35 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 09:35 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 09:35 claime: Raised replicas to 20 for mw-api-ext and mw-web - T346422
  • 09:35 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 09:34 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 09:34 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 09:34 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 09:34 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 09:33 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 09:33 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 09:30 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:29 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 09:29 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:28 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 09:27 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:26 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 09:25 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 09:23 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:23 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:22 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:22 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 09:22 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:21 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:20 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 09:20 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:19 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 09:19 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 09:18 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 09:17 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 09:17 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 09:16 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 09:16 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 09:15 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 09:15 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 09:15 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 09:15 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 09:15 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 09:14 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 09:14 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 09:13 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 09:13 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 09:13 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 09:12 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 09:09 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 09:08 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 08:52 taavi@deploy2002: taavi: Continuing with sync
  • 08:52 taavi@deploy2002: taavi: Backport for wikitech: Disable password resets (T345226), wikitech: Block account creation by sysops too (T345226) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:51 taavi@deploy2002: Started scap: Backport for wikitech: Disable password resets (T345226), wikitech: Block account creation by sysops too (T345226)
  • 08:03 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.eqiad.wmnet
  • 07:56 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.eqiad.wmnet
  • 07:55 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.eqiad.wmnet with OS bullseye
  • 07:55 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
  • 07:54 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
  • 07:45 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1007 - taavi@cumin1001"
  • 07:44 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1007 - taavi@cumin1001"
  • 07:25 taavi@deploy2002: Finished scap: Backport for Enable wgMinervaEnableSiteNotice for knwiki (T346582), guwikisource: add audiobook namespace (T347189), add throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T346043) (duration: 11m 41s)
  • 07:18 taavi@deploy2002: anzx and taavi: Continuing with sync
  • 07:15 taavi@deploy2002: anzx and taavi: Backport for Enable wgMinervaEnableSiteNotice for knwiki (T346582), guwikisource: add audiobook namespace (T347189), add throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T346043) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug k
  • 07:13 taavi@deploy2002: Started scap: Backport for Enable wgMinervaEnableSiteNotice for knwiki (T346582), guwikisource: add audiobook namespace (T347189), add throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T346043)
  • 07:08 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
  • 07:05 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
  • 06:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:57 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:56 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:42 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.eqiad.wmnet with OS bullseye
  • 04:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
  • 03:54 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.26 (duration: 02m 13s)
  • 03:52 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.28 refs T345889 (duration: 49m 31s)
  • 03:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.28 refs T345889
  • 02:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1232.eqiad.wmnet with OS bullseye
  • 02:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS bullseye
  • 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1233.eqiad.wmnet with OS bullseye
  • 02:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1230.eqiad.wmnet with OS bullseye
  • 02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1226.eqiad.wmnet with OS bullseye
  • 02:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1232.eqiad.wmnet with reason: host reimage
  • 02:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage
  • 02:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
  • 02:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1230.eqiad.wmnet with reason: host reimage
  • 02:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
  • 02:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1232.eqiad.wmnet with reason: host reimage
  • 02:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage
  • 02:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1230.eqiad.wmnet with reason: host reimage
  • 02:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
  • 02:11 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
  • 02:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1233.eqiad.wmnet with OS bullseye
  • 02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1232.eqiad.wmnet with OS bullseye
  • 02:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS bullseye
  • 02:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1230.eqiad.wmnet with OS bullseye
  • 02:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
  • 02:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1228.eqiad.wmnet with OS bullseye
  • 02:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS bullseye
  • 01:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1226.eqiad.wmnet with OS bullseye
  • food: payments-wiki upgraded from 5596c7fd to 358e616e
  • 01:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
  • 01:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
  • 01:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
  • 01:19 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 01:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
  • 01:19 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
  • 01:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
  • 01:18 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 01:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T343198)', diff saved to https://phabricator.wikimedia.org/P52628 and previous config saved to /var/cache/conftool/dbconfig/20230926-011707-arnaudb.json
  • 01:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 01:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 01:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 01:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 01:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T343198)', diff saved to https://phabricator.wikimedia.org/P52627 and previous config saved to /var/cache/conftool/dbconfig/20230926-011629-arnaudb.json
  • 01:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P52626 and previous config saved to /var/cache/conftool/dbconfig/20230926-010123-arnaudb.json
  • 00:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P52625 and previous config saved to /var/cache/conftool/dbconfig/20230926-004616-arnaudb.json
  • 00:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
  • 00:40 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 00:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T343198)', diff saved to https://phabricator.wikimedia.org/P52624 and previous config saved to /var/cache/conftool/dbconfig/20230926-003109-arnaudb.json
  • 00:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1024.eqiad.wmnet with OS bullseye
  • 00:30 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:29 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1022.eqiad.wmnet with OS bullseye
  • 00:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1019.eqiad.wmnet with OS bullseye
  • 00:26 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:25 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 00:24 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
  • 00:09 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1022.eqiad.wmnet with reason: host reimage
  • 00:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
  • 00:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1024.eqiad.wmnet with reason: host reimage
  • 00:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
  • 00:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage

2023-09-25

  • 23:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:50 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1022.eqiad.wmnet with OS bullseye
  • 23:48 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1022']
  • 23:45 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:45 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
  • 23:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1024.eqiad.wmnet with OS bullseye
  • 23:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1019.eqiad.wmnet with OS bullseye
  • 23:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 23:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1019']
  • 23:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018']
  • 23:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 23:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
  • 23:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1024']
  • 23:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:42 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:42 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1021']
  • 23:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1017']
  • 23:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:37 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:37 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1019']
  • 23:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018']
  • 23:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:35 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
  • 23:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1019']
  • 23:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1018']
  • 23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1024']
  • 23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
  • 23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021']
  • 23:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1019']
  • 23:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018']
  • 23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1024.eqiad.wmnet with OS bullseye
  • 23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1023.eqiad.wmnet with OS bullseye
  • 23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1022.eqiad.wmnet with OS bullseye
  • 23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
  • 23:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017']
  • 23:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1017']
  • 23:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017']
  • 23:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1019.eqiad.wmnet with OS bullseye
  • 23:31 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
  • 23:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
  • 22:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1024.eqiad.wmnet with OS bullseye
  • 22:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
  • 22:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1022.eqiad.wmnet with OS bullseye
  • 22:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
  • 22:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1019.eqiad.wmnet with OS bullseye
  • 22:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
  • 22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1019.eqiad.wmnet']
  • 22:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023.eqiad.wmnet']
  • 22:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
  • 22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
  • 22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1021.eqiad.wmnet']
  • 22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1024.eqiad.wmnet']
  • 22:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022']
  • 22:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
  • 22:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022']
  • 22:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
  • 22:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022']
  • 22:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
  • 22:03 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1019.eqiad.wmnet']
  • 22:01 dancy@deploy2002: Finished scap: final test sync (duration: 15m 00s)
  • 21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
  • 21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
  • 21:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
  • 21:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
  • 21:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023.eqiad.wmnet']
  • 21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
  • 21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1024.eqiad.wmnet']
  • 21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
  • 21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021.eqiad.wmnet']
  • 21:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1022.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1019.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:46 dancy@deploy2002: Started scap: final test sync
  • 21:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1018.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:45 dancy@deploy2002: Started scap: testing scap mods
  • 21:38 dancy@deploy2002: Started scap: testing scap mods
  • 21:37 dancy@deploy2002: Installation of scap version "4.62.0" completed for 598 hosts
  • 21:36 dancy@deploy2002: Installing scap version "4.62.0" for 598 hosts
  • 21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:31 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1019.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1018.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:30 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:30 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
  • 21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1022.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:29 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
  • 21:27 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 21:22 dancy@deploy2002: Started scap: testing scap mods
  • 21:20 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
  • 21:19 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
  • 21:17 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 21:12 cjming: end of UTC late backport window
  • 21:02 cjming@deploy2002: Finished scap: Backport for Provide wordmarks/taglines for Wikibooks projects (T341251), Fix white background for Wikibooks wordmarks (T341251), Icons for special projects (T341242) (duration: 23m 50s)
  • 20:53 cjming@deploy2002: pikne and cjming and jdlrobson: Continuing with sync
  • 20:51 cjming@deploy2002: pikne and cjming and jdlrobson: Backport for Provide wordmarks/taglines for Wikibooks projects (T341251), Fix white background for Wikibooks wordmarks (T341251), Icons for special projects (T341242) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes
  • 20:39 cjming@deploy2002: Started scap: Backport for Provide wordmarks/taglines for Wikibooks projects (T341251), Fix white background for Wikibooks wordmarks (T341251), Icons for special projects (T341242)
  • 20:25 cjming@deploy2002: Finished scap: Backport for Deploy Reader Demographics 2 pilot survey (T345951) (duration: 21m 18s)
  • 20:16 cjming@deploy2002: cjming and dani: Continuing with sync
  • 20:15 cjming@deploy2002: cjming and dani: Backport for Deploy Reader Demographics 2 pilot survey (T345951) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:03 cjming@deploy2002: Started scap: Backport for Deploy Reader Demographics 2 pilot survey (T345951)
  • 18:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
  • 18:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
  • 18:36 ejegg: Standalone (payments listener) SmashPig upgraded from 0703ce60 to a78a91d9
  • 16:54 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2010.codfw.wmnet
  • 16:51 jayme: uncordon kubernetes2010.codfw.wmnet - T347267
  • 16:11 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 16:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 16:09 sukhe@cumin2002: dbctl commit (dc=all): 'Depool db2109', diff saved to https://phabricator.wikimedia.org/P52622 and previous config saved to /var/cache/conftool/dbconfig/20230925-160904-sukhe.json
  • 16:01 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:57 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:55 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:54 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
  • 15:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
  • 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
  • 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
  • 15:30 ejegg: Standalone (payments listener) SmashPig upgraded from 2412df22 to 0703ce60
  • 15:23 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:23 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new records for cloudcontrol1007 - cmooney@cumin1001"
  • 15:23 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new records for cloudcontrol1007 - cmooney@cumin1001"
  • 15:22 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1007
  • 15:21 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1007
  • 15:21 herron: alert[12]001 -- rm /etc/apache2/sites-available/50-dispatch-wikimedia-org.conf && apachectl graceful T344937
  • 15:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P52621 and previous config saved to /var/cache/conftool/dbconfig/20230925-152043-ladsgroup.json
  • 15:19 herron: alert[12]001 -- apt remove docker.io T344937
  • 15:17 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:17 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1007 - taavi@cumin1001"
  • 15:16 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1007 - taavi@cumin1001"
  • 15:14 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P52620 and previous config saved to /var/cache/conftool/dbconfig/20230925-150536-ladsgroup.json
  • 15:00 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:57 moritzm: installing python3.7 security updates
  • 14:53 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P52619 and previous config saved to /var/cache/conftool/dbconfig/20230925-145029-ladsgroup.json
  • 14:46 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 14:46 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 14:45 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 14:45 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
  • 14:43 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet
  • 14:43 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:43 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
  • 14:39 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:39 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:38 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:38 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:37 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:36 jayme@deploy2002: Finished scap: (no justification provided) (duration: 03m 09s)
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P52618 and previous config saved to /var/cache/conftool/dbconfig/20230925-143523-ladsgroup.json
  • 14:35 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:34 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:34 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:33 jayme@deploy2002: Started scap: (no justification provided)
  • 14:33 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:32 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:32 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:31 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
  • 14:31 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:31 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:31 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:30 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:29 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 14:29 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:28 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:25 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:25 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:24 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:24 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:24 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:24 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet
  • 14:22 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:22 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:22 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:21 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:20 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:20 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:19 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:19 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P52615 and previous config saved to /var/cache/conftool/dbconfig/20230925-141313-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
  • 14:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T343198)', diff saved to https://phabricator.wikimedia.org/P52614 and previous config saved to /var/cache/conftool/dbconfig/20230925-141252-arnaudb.json
  • 14:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 14:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 14:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T343198)', diff saved to https://phabricator.wikimedia.org/P52613 and previous config saved to /var/cache/conftool/dbconfig/20230925-141230-arnaudb.json
  • 14:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
  • 14:04 urbanecm@deploy2002: Finished scap: Backport for listTaskCounts: Do not expect tasks key to be present (T347120), AddImageFeedbackHandler: Add missing parameters (T346277), Enable Parsoid support for Kartographer on enwikivoyage (T342871) (duration: 38m 35s)
  • 14:00 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
  • 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59278
  • 13:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
  • 13:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P52612 and previous config saved to /var/cache/conftool/dbconfig/20230925-135724-arnaudb.json
  • 13:51 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P52611 and previous config saved to /var/cache/conftool/dbconfig/20230925-135004-ladsgroup.json
  • 13:43 urbanecm@deploy2002: urbanecm and ihurbain: Continuing with sync
  • 13:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P52610 and previous config saved to /var/cache/conftool/dbconfig/20230925-134217-arnaudb.json
  • 13:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
  • 13:38 urbanecm@deploy2002: urbanecm and ihurbain: Backport for listTaskCounts: Do not expect tasks key to be present (T347120), AddImageFeedbackHandler: Add missing parameters (T346277), Enable Parsoid support for Kartographer on enwikivoyage (T342871) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.w
  • 13:36 jayme@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.codfw.wmnet
  • 13:36 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,name=kubernetes.*
  • 13:35 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 13:35 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,name=kubernetes.*
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P52607 and previous config saved to /var/cache/conftool/dbconfig/20230925-133457-ladsgroup.json
  • 13:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
  • 13:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T343198)', diff saved to https://phabricator.wikimedia.org/P52606 and previous config saved to /var/cache/conftool/dbconfig/20230925-132711-arnaudb.json
  • 13:26 urbanecm@deploy2002: Started scap: Backport for listTaskCounts: Do not expect tasks key to be present (T347120), AddImageFeedbackHandler: Add missing parameters (T346277), Enable Parsoid support for Kartographer on enwikivoyage (T342871)
  • 13:25 urbanecm@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink backend 14th round of wikis (T308139) (duration: 23m 28s)
  • 13:22 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
  • 13:22 jayme@cumin1001: conftool action : set/weight=10; selector: service=kubesvc,cluster=kubernetes,dc=codfw
  • 13:21 jayme@cumin1001: conftool action : set/weight=10; selector: service=kubesvc,cluster=kubernetes,dc=eqiad
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P52605 and previous config saved to /var/cache/conftool/dbconfig/20230925-131951-ladsgroup.json
  • 13:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
  • 13:15 urbanecm@deploy2002: urbanecm and sgimeno: Continuing with sync
  • 13:14 jayme: ran homer "lsw1-*eqiad*" commit - T346714
  • 13:14 urbanecm@deploy2002: urbanecm and sgimeno: Backport for GrowthExperiments: enable AddLink backend 14th round of wikis (T308139) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:13 jayme: uncordoned kubernetes10[27-56]
  • 13:11 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
  • 13:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint1002.eqiad.wmnet
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P52604 and previous config saved to /var/cache/conftool/dbconfig/20230925-130444-ladsgroup.json
  • 13:04 moritzm: installing openjdk-11 security updates on buster
  • 13:03 jayme: cordoned kubernetes10[27-56]
  • 13:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59278
  • 13:01 urbanecm@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink backend 14th round of wikis (T308139)
  • 13:01 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint1002.eqiad.wmnet
  • 12:56 kamila_: put codfw before eqiad in geoDNS defaults
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P52603 and previous config saved to /var/cache/conftool/dbconfig/20230925-125212-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1024-1025].eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es[1024-1025].eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
  • 12:26 jayme@deploy2002: Finished scap: (no justification provided) (duration: 10m 08s)
  • 12:17 jayme: bumping k8s deployment mw-web and mw-api-ext to 16 replicas each in both DCs
  • 12:16 jayme@deploy2002: Started scap: (no justification provided)
  • 11:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
  • 11:43 jayme: running puppet on lvs in eqiad - T346714 (TYPO from above, did not run in codfw)
  • 11:42 jayme: running puppet on lvs in codfw - T346714
  • 11:37 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
  • 11:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1054.eqiad.wmnet with OS bullseye
  • 11:20 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1013.eqiad.wmnet
  • 11:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1014.eqiad.wmnet
  • 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1053.eqiad.wmnet with OS bullseye
  • 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1047.eqiad.wmnet with OS bullseye
  • 11:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1014.eqiad.wmnet
  • 11:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1050.eqiad.wmnet with OS bullseye
  • 11:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1046.eqiad.wmnet with OS bullseye
  • 11:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
  • 11:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1049.eqiad.wmnet with OS bullseye
  • 11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1044.eqiad.wmnet with OS bullseye
  • 11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1048.eqiad.wmnet with OS bullseye
  • 11:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T344589)', diff saved to https://phabricator.wikimedia.org/P52602 and previous config saved to /var/cache/conftool/dbconfig/20230925-110343-ladsgroup.json
  • 11:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1056.eqiad.wmnet with OS bullseye
  • 11:00 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
  • 10:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1043.eqiad.wmnet with OS bullseye
  • 10:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
  • 10:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1045.eqiad.wmnet with OS bullseye
  • 10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1055.eqiad.wmnet with OS bullseye
  • 10:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
  • 10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1051.eqiad.wmnet with OS bullseye
  • 10:54 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
  • 10:53 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
  • 10:53 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1054.eqiad.wmnet with reason: host reimage
  • 10:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
  • 10:52 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
  • 10:50 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
  • 10:50 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
  • 10:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
  • 10:49 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P52601 and previous config saved to /var/cache/conftool/dbconfig/20230925-104837-ladsgroup.json
  • 10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1034.eqiad.wmnet with OS bullseye
  • 10:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
  • 10:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
  • 10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1055.eqiad.wmnet with reason: host reimage
  • 10:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
  • 10:47 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
  • 10:47 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
  • 10:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1037.eqiad.wmnet with OS bullseye
  • 10:45 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
  • 10:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
  • 10:45 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
  • 10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
  • 10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1054.eqiad.wmnet with reason: host reimage
  • 10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1055.eqiad.wmnet with reason: host reimage
  • 10:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1039.eqiad.wmnet with OS bullseye
  • 10:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
  • 10:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
  • 10:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
  • 10:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1033.eqiad.wmnet with OS bullseye
  • 10:40 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
  • 10:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1035.eqiad.wmnet with OS bullseye
  • 10:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
  • 10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
  • 10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
  • 10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
  • 10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
  • 10:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1041.eqiad.wmnet with OS bullseye
  • 10:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
  • 10:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
  • 10:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
  • 10:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
  • 10:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1040.eqiad.wmnet with OS bullseye
  • 10:36 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1042.eqiad.wmnet with OS bullseye
  • 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1128', diff saved to https://phabricator.wikimedia.org/P52600 and previous config saved to /var/cache/conftool/dbconfig/20230925-103454-root.json
  • 10:34 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 10:34 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P52599 and previous config saved to /var/cache/conftool/dbconfig/20230925-103330-ladsgroup.json
  • 10:31 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
  • 10:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
  • 10:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
  • 10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1055.eqiad.wmnet with OS bullseye
  • 10:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
  • 10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1056.eqiad.wmnet with OS bullseye
  • 10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1054.eqiad.wmnet with OS bullseye
  • 10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1053.eqiad.wmnet with OS bullseye
  • 10:26 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1051.eqiad.wmnet with OS bullseye
  • 10:26 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1050.eqiad.wmnet with OS bullseye
  • 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1049.eqiad.wmnet with OS bullseye
  • 10:25 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1047.eqiad.wmnet with OS bullseye
  • 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1048.eqiad.wmnet with OS bullseye
  • 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
  • 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1046.eqiad.wmnet with OS bullseye
  • 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1045.eqiad.wmnet with OS bullseye
  • 10:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
  • 10:24 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1044.eqiad.wmnet with OS bullseye
  • 10:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
  • 10:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1043.eqiad.wmnet with OS bullseye
  • 10:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
  • 10:22 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
  • 10:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
  • 10:20 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
  • 10:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
  • 10:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
  • 10:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
  • 10:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
  • 10:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T344589)', diff saved to https://phabricator.wikimedia.org/P52597 and previous config saved to /var/cache/conftool/dbconfig/20230925-101824-ladsgroup.json
  • 10:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
  • 10:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
  • 10:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
  • 10:09 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 10:09 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 10:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 10:08 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 10:05 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1042.eqiad.wmnet with OS bullseye
  • 10:05 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1041.eqiad.wmnet with OS bullseye
  • 10:04 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1040.eqiad.wmnet with OS bullseye
  • 10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1039.eqiad.wmnet with OS bullseye
  • 10:04 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1037.eqiad.wmnet with OS bullseye
  • 10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 10:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1035.eqiad.wmnet with OS bullseye
  • 10:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
  • 10:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1033.eqiad.wmnet with OS bullseye
  • 09:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1031.eqiad.wmnet with OS bullseye
  • 09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1032.eqiad.wmnet with OS bullseye
  • 09:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1030.eqiad.wmnet with OS bullseye
  • 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1022 (T344589)', diff saved to https://phabricator.wikimedia.org/P52596 and previous config saved to /var/cache/conftool/dbconfig/20230925-095235-ladsgroup.json
  • 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
  • 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1021-1022].eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es[1021-1022].eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
  • 09:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
  • 09:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
  • 09:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
  • 09:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
  • 09:38 jelto: switch people.wikimedia.org to codfw - T345618
  • 09:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
  • 09:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
  • 09:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
  • 09:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
  • 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db[1137,1216,1220,1225].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db[1137,1216,1220,1225].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 17 hosts with reason: Maintenance
  • 09:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 17 hosts with reason: Maintenance
  • 09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 09:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 09:20 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1032.eqiad.wmnet with OS bullseye
  • 09:19 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1031.eqiad.wmnet with OS bullseye
  • 09:19 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1030.eqiad.wmnet with OS bullseye
  • 09:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
  • 09:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
  • 09:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 13 hosts with reason: Maintenance
  • 09:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 13 hosts with reason: Maintenance
  • 09:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 08:59 Amir1: by the power vested in my be Chris Albon and ML team, I now pronounce ORES dead.
  • 08:58 elukey: migrate ores.wikimedia.org's ATS backend to ores-legacy.discovery.wmnet (k8s app) - This will drain traffic to ORES bare metal nodes - T341696
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 15 hosts with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 15 hosts with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 08:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 16 hosts with reason: Schema change
  • 08:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 16 hosts with reason: Schema change
  • 08:43 jayme: jayme@cumin1001 conftool action : set/pooled=no; selector: name=kubernetes2010.* - T347267
  • 08:43 jayme@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.*
  • 08:39 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: host is down
  • 08:39 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: host is down
  • 08:27 jayme: draining kubernetes2010.codfw.wmnet - T347267
  • 08:01 jayme: cordoning kubernetes2010
  • 07:49 taavi: drop cloudmetrics exceptions from cr firewall ACLs https://gerrit.wikimedia.org/r/c/operations/homer/public/+/960027 T326266
  • 07:47 taavi@deploy2002: Finished scap: Backport for Make sure different key values are handled while submitting (T345496) (duration: 30m 55s)
  • 07:38 taavi@deploy2002: taavi and soda: Continuing with sync
  • 07:37 XioNoX: update eqsin-ulsfo tranport link ospf metrics to match the new latency of 175ms
  • 07:29 taavi@deploy2002: taavi and soda: Backport for Make sure different key values are handled while submitting (T345496) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:22 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:20 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:16 taavi@deploy2002: Started scap: Backport for Make sure different key values are handled while submitting (T345496)
  • 07:06 XioNoX: roll out "Block inbound RAs on the routers" - T334916
  • 06:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 35008
  • 06:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 35008
  • 05:27 kart_: Updated cxserver to 2023-09-13-074325-production (T346045)
  • 05:27 kart_: Updated cxserver to 2023-09-13-074325-production (T346045)
  • 05:22 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:22 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:13 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:12 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:08 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:08 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-09-24

  • 23:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T343198)', diff saved to https://phabricator.wikimedia.org/P52595 and previous config saved to /var/cache/conftool/dbconfig/20230924-230515-arnaudb.json
  • 23:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 23:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 23:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T343198)', diff saved to https://phabricator.wikimedia.org/P52594 and previous config saved to /var/cache/conftool/dbconfig/20230924-230443-arnaudb.json
  • 22:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P52593 and previous config saved to /var/cache/conftool/dbconfig/20230924-224936-arnaudb.json
  • 22:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P52592 and previous config saved to /var/cache/conftool/dbconfig/20230924-223430-arnaudb.json
  • 22:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T343198)', diff saved to https://phabricator.wikimedia.org/P52591 and previous config saved to /var/cache/conftool/dbconfig/20230924-221923-arnaudb.json
  • 10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T343198)', diff saved to https://phabricator.wikimedia.org/P52590 and previous config saved to /var/cache/conftool/dbconfig/20230924-102809-arnaudb.json
  • 10:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T343198)', diff saved to https://phabricator.wikimedia.org/P52589 and previous config saved to /var/cache/conftool/dbconfig/20230924-102747-arnaudb.json
  • 10:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P52588 and previous config saved to /var/cache/conftool/dbconfig/20230924-101241-arnaudb.json
  • 09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P52587 and previous config saved to /var/cache/conftool/dbconfig/20230924-095734-arnaudb.json
  • 09:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T343198)', diff saved to https://phabricator.wikimedia.org/P52586 and previous config saved to /var/cache/conftool/dbconfig/20230924-094227-arnaudb.json

2023-09-23

  • 22:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T343198)', diff saved to https://phabricator.wikimedia.org/P52585 and previous config saved to /var/cache/conftool/dbconfig/20230923-222721-arnaudb.json
  • 22:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 22:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 22:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T343198)', diff saved to https://phabricator.wikimedia.org/P52584 and previous config saved to /var/cache/conftool/dbconfig/20230923-222659-arnaudb.json
  • 22:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P52583 and previous config saved to /var/cache/conftool/dbconfig/20230923-221152-arnaudb.json
  • 21:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P52582 and previous config saved to /var/cache/conftool/dbconfig/20230923-215646-arnaudb.json
  • 21:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T343198)', diff saved to https://phabricator.wikimedia.org/P52581 and previous config saved to /var/cache/conftool/dbconfig/20230923-214139-arnaudb.json
  • 10:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T343198)', diff saved to https://phabricator.wikimedia.org/P52580 and previous config saved to /var/cache/conftool/dbconfig/20230923-101423-arnaudb.json
  • 10:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 10:14 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance

2023-09-22

  • 22:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 22:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 17:32 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@a30e944]: (no justification provided) (duration: 00m 09s)
  • 17:32 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@a30e944]: (no justification provided)
  • 15:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:45 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:45 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
  • 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
  • 15:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:31 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:30 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:24 denisse: upgrading LibreNMS in eqiad
  • 15:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1247']
  • 15:19 denisse: upgrading LibreNMS to 23.9.1
  • 15:13 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.9.1 - T346737 (duration: 00m 09s)
  • 15:13 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.9.1 - T346737
  • 15:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1247']
  • 15:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc1015']
  • 14:58 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1015']
  • 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:23 brouberol@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
  • 12:17 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
  • 12:13 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
  • 11:58 brouberol@cumin1001: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
  • 11:42 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apt-staging2001.codfw.wmnet with OS bookworm
  • 11:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 11:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 11:28 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apt-staging2001.codfw.wmnet with reason: host reimage
  • 11:25 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on apt-staging2001.codfw.wmnet with reason: host reimage
  • 11:09 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host apt-staging2001.codfw.wmnet with OS bookworm
  • 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
  • 10:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
  • 10:00 fabfur: repool cp1090 (T346874)
  • 09:53 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
  • 09:50 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
  • 09:45 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcumin1001.eqiad.wmnet
  • 09:45 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
  • 09:43 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcumin1001.eqiad.wmnet
  • 09:43 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
  • 09:23 Amir1: dbmaint on s2@eqiad (T343198)
  • 09:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 16 hosts with reason: Schema change
  • 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 16 hosts with reason: Schema change
  • 09:13 moritzm: installing perf updates on bookworm hosts
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 15 hosts with reason: Schema change
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 15 hosts with reason: Schema change
  • 09:06 moritzm: installing perf updates on buster hosts
  • 08:51 Amir1: dbmaint on s4@eqiad (T343198)
  • 08:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 20 hosts with reason: Schema change
  • 08:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 20 hosts with reason: Schema change
  • 07:45 hashar: Upgrading CI Jenkins from 2.401.3 to 2.414.2
  • 07:36 hashar: Restarting Gerrit to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/953967 "Link account creation to IDM" # T345226
  • 07:06 moritzm: installing mutt security updates
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1132', diff saved to https://phabricator.wikimedia.org/P52577 and previous config saved to /var/cache/conftool/dbconfig/20230922-063617-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P52576 and previous config saved to /var/cache/conftool/dbconfig/20230922-063212-root.json
  • 05:13 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 00:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T343198)', diff saved to https://phabricator.wikimedia.org/P52575 and previous config saved to /var/cache/conftool/dbconfig/20230922-004330-arnaudb.json
  • 00:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P52574 and previous config saved to /var/cache/conftool/dbconfig/20230922-002823-arnaudb.json
  • 00:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P52573 and previous config saved to /var/cache/conftool/dbconfig/20230922-001316-arnaudb.json

2023-09-21

  • 23:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T343198)', diff saved to https://phabricator.wikimedia.org/P52572 and previous config saved to /var/cache/conftool/dbconfig/20230921-235810-arnaudb.json
  • 22:02 ejegg: Standalone (listener) SmashPig upgraded from ca5b6218 to 2412df22
  • 20:28 brennen: end of UTC late backport & config window
  • 20:27 brennen@deploy2002: Finished scap: Backport for Update Reader Demographics 2 pilot survey (T345951) (duration: 21m 36s)
  • 20:18 brennen@deploy2002: dani and brennen: Continuing with sync
  • 20:17 brennen@deploy2002: dani and brennen: Backport for Update Reader Demographics 2 pilot survey (T345951) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:06 brennen@deploy2002: Started scap: Backport for Update Reader Demographics 2 pilot survey (T345951)
  • 20:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T343198)', diff saved to https://phabricator.wikimedia.org/P52570 and previous config saved to /var/cache/conftool/dbconfig/20230921-200439-arnaudb.json
  • 20:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 20:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 20:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T343198)', diff saved to https://phabricator.wikimedia.org/P52569 and previous config saved to /var/cache/conftool/dbconfig/20230921-200417-arnaudb.json
  • 20:01 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reords for codfw test servers - cmooney@cumin1001"
  • 19:59 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reords for codfw test servers - cmooney@cumin1001"
  • 19:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P52568 and previous config saved to /var/cache/conftool/dbconfig/20230921-194911-arnaudb.json
  • 19:47 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P52567 and previous config saved to /var/cache/conftool/dbconfig/20230921-193404-arnaudb.json
  • 19:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T343198)', diff saved to https://phabricator.wikimedia.org/P52566 and previous config saved to /var/cache/conftool/dbconfig/20230921-191858-arnaudb.json
  • 19:17 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
  • 19:13 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
  • 18:54 ladsgroup@deploy2002: Finished scap: Backport for Enable Url shortener in sidebar in all wikis (T267921) (duration: 20m 47s)
  • 18:47 ejegg: payments-wiki upgraded from 9cd3e4cd to 5596c7fd
  • 18:45 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 18:45 ladsgroup@deploy2002: ladsgroup: Backport for Enable Url shortener in sidebar in all wikis (T267921) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52565 and previous config saved to /var/cache/conftool/dbconfig/20230921-184000-ladsgroup.json
  • 18:34 ladsgroup@deploy2002: Started scap: Backport for Enable Url shortener in sidebar in all wikis (T267921)
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52564 and previous config saved to /var/cache/conftool/dbconfig/20230921-182455-ladsgroup.json
  • 18:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.27 refs T345888
  • 18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52562 and previous config saved to /var/cache/conftool/dbconfig/20230921-180949-ladsgroup.json
  • 18:05 brennen: train 1.41.0-wmf.27 (T345888): no current blockers, logs clean, rolling to group2 shortly.
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1166 (T346365)', diff saved to https://phabricator.wikimedia.org/P52561 and previous config saved to /var/cache/conftool/dbconfig/20230921-180003-ladsgroup.json
  • 17:59 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@ddcc518]: Deploy latest DAGs to analytics Airflow instance (duration: 00m 40s)
  • 17:58 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@ddcc518]: Deploy latest DAGs to analytics Airflow instance
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1166 (T346365)', diff saved to https://phabricator.wikimedia.org/P52560 and previous config saved to /var/cache/conftool/dbconfig/20230921-175634-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P52559 and previous config saved to /var/cache/conftool/dbconfig/20230921-175444-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2149 (T346365)', diff saved to https://phabricator.wikimedia.org/P52558 and previous config saved to /var/cache/conftool/dbconfig/20230921-174934-ladsgroup.json
  • 17:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2014.codfw.wmnet
  • 17:41 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2014.codfw.wmnet
  • 17:35 ejegg: re-enabled contribution tracking queue consumer
  • 17:30 ejegg: civicrm upgraded from f0e9d3f6 to 9efea665
  • 17:29 ejegg: disabled contribution_tracking queue consumer for Civi update
  • 17:27 eoghan@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host apt-staging2001.codfw.wmnet
  • 17:27 eoghan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apt-staging2001.codfw.wmnet with OS bookworm
  • 17:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2014.codfw.wmnet with OS bullseye
  • 16:45 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2014.codfw.wmnet with reason: host reimage
  • 16:42 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2014.codfw.wmnet with reason: host reimage
  • 16:26 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2014.codfw.wmnet with OS bullseye
  • 16:11 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host apt-staging2001.codfw.wmnet with OS bookworm
  • 16:10 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
  • 16:10 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
  • 16:10 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt-staging2001.codfw.wmnet on all recursors
  • 16:09 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache apt-staging2001.codfw.wmnet on all recursors
  • 16:09 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:09 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
  • 16:08 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
  • 16:02 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 16:02 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host apt-staging2001.codfw.wmnet
  • 15:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T343198)', diff saved to https://phabricator.wikimedia.org/P52557 and previous config saved to /var/cache/conftool/dbconfig/20230921-153428-arnaudb.json
  • 15:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 15:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 15:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T343198)', diff saved to https://phabricator.wikimedia.org/P52556 and previous config saved to /var/cache/conftool/dbconfig/20230921-153406-arnaudb.json
  • 15:33 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 15:25 jayme@deploy2002: Finished scap: (no justification provided) (duration: 02m 29s)
  • 15:22 jayme@deploy2002: Started scap: (no justification provided)
  • 15:20 moritzm: installing php7.3 security updates (as packaged in Debian Buster)
  • 15:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P52555 and previous config saved to /var/cache/conftool/dbconfig/20230921-151900-arnaudb.json
  • 15:14 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for SpecialUndelete: Do not clone RequestContext (T346995) (duration: 34m 13s)
  • 15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 22 hosts with reason: Schema change
  • 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 22 hosts with reason: Schema change
  • 15:12 Amir1: dbmaint on s8@eqiad (T343198)
  • 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 18 hosts with reason: Schema change
  • 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 18 hosts with reason: Schema change
  • 15:06 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2013.codfw.wmnet
  • 15:06 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2013.codfw.wmnet
  • 15:05 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 15:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P52554 and previous config saved to /var/cache/conftool/dbconfig/20230921-150353-arnaudb.json
  • 15:01 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
  • 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for SpecialUndelete: Do not clone RequestContext (T346995) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T343198)', diff saved to https://phabricator.wikimedia.org/P52553 and previous config saved to /var/cache/conftool/dbconfig/20230921-144847-arnaudb.json
  • 14:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2013.codfw.wmnet with OS bullseye
  • 14:40 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for SpecialUndelete: Do not clone RequestContext (T346995)
  • 14:31 moritzm: imported cas 6.6.12+wmf11u1 to apt.wikimedia.org
  • 14:31 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 14:19 tchanders@deploy2002: Finished scap: Backport for Enable partial action blocks on mediawikiwiki (T332733) (duration: 34m 01s)
  • 14:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2013.codfw.wmnet with reason: host reimage
  • 14:14 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2013.codfw.wmnet with reason: host reimage
  • 14:07 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 14:04 tchanders@deploy2002: tchanders: Continuing with sync
  • 14:03 tchanders@deploy2002: tchanders: Backport for Enable partial action blocks on mediawikiwiki (T332733) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:59 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2013.codfw.wmnet with OS bullseye
  • 13:53 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:43 tchanders@deploy2002: Started scap: Backport for Enable partial action blocks on mediawikiwiki (T332733)
  • 13:39 tchanders@deploy2002: Finished scap: Backport for Enable partial action blocks on commonswiki (T339878) (duration: 35m 04s)
  • 13:37 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:36 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:34 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1015
  • 13:34 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1015
  • 13:30 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:28 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:27 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:26 tchanders@deploy2002: tchanders: Continuing with sync
  • 13:25 tchanders@deploy2002: tchanders: Backport for Enable partial action blocks on commonswiki (T339878) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:25 urbanecm: mwmaint2002: `mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki 'Private Incident Reporting System/Updates' 'Incident Reporting System/Updates' 'Martin Urbanec' --reason 'per request'` (T347019)
  • 13:08 fabfur: disabled puppet on cp1090 for T346874
  • 13:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2028.codfw.wmnet with OS bullseye
  • 13:04 tchanders@deploy2002: Started scap: Backport for Enable partial action blocks on commonswiki (T339878)
  • 12:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
  • 12:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
  • 12:31 milimetric@deploy2002: Finished deploy [analytics/aqs/deploy@041016f] (aqs): Enable etags on all AQS 1.0 endpoints (duration: 10m 23s)
  • 12:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2028.codfw.wmnet with OS bullseye
  • 12:22 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
  • 12:21 milimetric@deploy2002: Started deploy [analytics/aqs/deploy@041016f] (aqs): Enable etags on all AQS 1.0 endpoints
  • 12:20 fabfur: depooled cp1090.eqiad.wmnet to test new purged package version (T346874)
  • 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
  • 12:03 effie: cordon kubernetes2028 to reimage
  • 11:59 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
  • 11:57 ladsgroup@deploy2002: Finished scap: Backport for Turn on write both for pagelinks in largest s3 wikis (T345732) (duration: 36m 44s)
  • 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 11:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 11:43 ladsgroup@deploy2002: ladsgroup: Backport for Turn on write both for pagelinks in largest s3 wikis (T345732) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:39 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
  • 11:33 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 11:28 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
  • 11:21 ladsgroup@deploy2002: Started scap: Backport for Turn on write both for pagelinks in largest s3 wikis (T345732)
  • 11:20 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided) (duration: 01m 05s)
  • 11:19 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided)
  • 11:08 arturo: merging homer CR firewall patch https://gerrit.wikimedia.org/r/c/operations/homer/public/+/959706 for T346948
  • 10:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T343198)', diff saved to https://phabricator.wikimedia.org/P52550 and previous config saved to /var/cache/conftool/dbconfig/20230921-105723-arnaudb.json
  • 10:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 10:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 10:54 moritzm: installing c-ares security updates
  • 10:49 ladsgroup@deploy2002: Finished scap: Backport for Enable pagelinks write both in testwiki (T345732) (duration: 36m 27s)
  • 10:48 moritzm: installing flac security updates
  • 10:42 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:36 ladsgroup@deploy2002: ladsgroup: Continuing with sync
  • 10:34 ladsgroup@deploy2002: ladsgroup: Backport for Enable pagelinks write both in testwiki (T345732) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
  • 10:27 XioNoX: set max repeaters = 20 on asw2-a-eqiad - T346759
  • 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
  • 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 10:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 10:19 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:18 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 10:17 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 10:17 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 10:17 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 10:17 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 10:17 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 10:12 ladsgroup@deploy2002: Started scap: Backport for Enable pagelinks write both in testwiki (T345732)
  • 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove eqsin-eqdfw tunnel - ayounsi@cumin1001"
  • 10:09 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove eqsin-eqdfw tunnel - ayounsi@cumin1001"
  • 10:07 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:55 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:51 effie: disable puppet on kubernetes[2025-2053].codfw.wmnet
  • 09:42 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:41 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:40 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:40 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:38 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 09:38 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 09:36 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 09:36 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 09:35 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:34 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 09:33 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:32 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 09:30 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 09:30 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 09:28 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 09:28 XioNoX: remove GRE tunnel between eqsin and eqdfw - T344888
  • 09:27 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 09:08 kevinbazira@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti2030.codfw.wmnet with reason: Fixup DRBD
  • 09:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti2030.codfw.wmnet with reason: Fixup DRBD
  • 09:00 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1007.wikimedia.org
  • 09:00 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:00 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1007.wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 08:59 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1007.wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
  • 08:57 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 08:51 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1007.wikimedia.org
  • 08:14 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:14 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:14 brouberol: redeploying mw-page-content-change-enrich in staging T336041
  • 08:13 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 08:13 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 08:13 brouberol: redeploying eventstreams-internal in staging T336041
  • 08:12 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:12 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:12 brouberol: redeploying eventgate-analytics-external in staging T336041
  • 08:10 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 08:10 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 07:52 kartik@deploy2002: Finished scap: Backport for Enable MinT translation service on Meta-Wiki - rollout #5 (T341445) (duration: 42m 01s)
  • 07:38 kartik@deploy2002: kartik and abi: Continuing with sync
  • 07:32 kartik@deploy2002: kartik and abi: Backport for Enable MinT translation service on Meta-Wiki - rollout #5 (T341445) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:10 kartik@deploy2002: Started scap: Backport for Enable MinT translation service on Meta-Wiki - rollout #5 (T341445)
  • 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2915
  • 06:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2915
  • 06:31 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:31 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix cloudsw cloud-private records - taavi@cumin1001"
  • 06:30 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix cloudsw cloud-private records - taavi@cumin1001"
  • 06:28 taavi@cumin1001: START - Cookbook sre.dns.netbox
  • 05:52 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 05:49 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 05:47 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 05:47 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 05:44 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 05:44 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 05:40 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 05:38 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 05:24 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 05:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 02:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1248']
  • 02:17 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1249']
  • 02:17 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1246']
  • 02:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1245']
  • 02:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1242']
  • 02:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1249']
  • 02:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1248']
  • 02:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1246']
  • 02:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1244']
  • 02:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1243']
  • 02:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1241']
  • 02:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1240']
  • 02:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1245']
  • 01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1239']
  • 01:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1244']
  • 01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1237']
  • 01:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1243']
  • 01:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1242']
  • 01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1238']
  • 01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1236']
  • 01:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1241']
  • 01:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1235']
  • 01:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1240']
  • 01:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1234']
  • 01:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1239']
  • 01:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1245.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1235']
  • 01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1238']
  • 01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1237']
  • 01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1236']
  • 01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1234']
  • 01:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1248.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1246.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1249.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1244.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1246.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1245.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1248.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1243.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1241.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1242.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1240.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1249.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:17 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1235.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:11 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1244.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1234.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1243.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1242.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1241.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1240.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1237.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1239.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1238.mgmt.eqiad.wmnet with reboot policy FORCED
  • 01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1236.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1239.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1238.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1237.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1235.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1234.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1236.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:49 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1249
  • 00:48 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1248
  • 00:47 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1249
  • 00:47 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1246
  • 00:47 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1247
  • 00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1248
  • 00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1245
  • 00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1247
  • 00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1244
  • 00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1246
  • 00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1243
  • 00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1245
  • 00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1244
  • 00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1242
  • 00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1240
  • 00:45 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1243
  • 00:45 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1241
  • 00:44 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1242
  • 00:44 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1241
  • 00:44 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1240
  • 00:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1238
  • 00:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1236
  • 00:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1239
  • 00:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1239
  • 00:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1238
  • 00:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1236
  • 00:42 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1234
  • 00:42 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1235
  • 00:41 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1235
  • 00:41 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1234
  • 00:39 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:39 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[34-49] - jclark@cumin1001"
  • 00:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[34-49] - jclark@cumin1001"
  • 00:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 00:15 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc1016']
  • 00:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pc1015']
  • 00:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1015']
  • 00:07 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['pc1016']
  • 00:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1016']
  • 00:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1016']
  • 00:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED

2023-09-20

  • 23:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1016
  • 23:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:49 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1015
  • 23:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1015
  • 23:49 jclark@cumin1001: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host pc1016
  • 23:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1016
  • 23:48 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1016
  • 23:48 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:48 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pc1016 - jclark@cumin1001"
  • 23:47 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pc1016 - jclark@cumin1001"
  • 23:44 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 19:26 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@80496b8]: (no justification provided) (duration: 00m 05s)
  • 19:26 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@80496b8]: (no justification provided)
  • 19:25 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@80496b8]: (no justification provided) (duration: 00m 09s)
  • 19:24 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@80496b8]: (no justification provided)
  • 18:21 brennen@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.27 refs T345888 (duration: 07m 17s)
  • 18:14 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.27 refs T345888
  • 18:02 brennen: train 1.41.0-wmf.27 (T345888): no current blockers, logs clean, rolling to group1
  • 16:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 16:28 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 16:26 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:26 klausman: pushing revert of ORES TTL change
  • 16:24 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:30 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 15:30 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 15:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:09 moritzm: added Taavi and Effie (new key) to pwstore
  • 15:08 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 15:08 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 15:06 brouberol@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 15:05 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 15:05 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 15:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:03 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
  • 15:03 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
  • 15:02 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 15:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 14:59 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 14:58 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 14:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud-private records - cmooney@cumin1001"
  • 14:44 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud-private records - cmooney@cumin1001"
  • 14:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:35 kamila_: update maintenance.eqiad.wmnet to point to mwmaint2002
  • 14:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2044.codfw.wmnet for high load - bking@cumin1001
  • 14:26 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2044.codfw.wmnet for high load - bking@cumin1001
  • 14:25 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic2044 for high load - bking@cumin1001
  • 14:25 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2044 for high load - bking@cumin1001
  • 14:16 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
  • 14:10 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
  • 14:09 kamila@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover: MediaWiki - T346474 (duration: 12m 54s)
  • 14:07 kamila_: Phase 9.5 Update DNS records for new database masters - T346474
  • 14:06 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
  • 14:06 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
  • 14:06 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:04 marostegui: Testing
  • 14:03 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:03 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 14:03 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 14:03 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:03 kamila@cumin1001: MediaWiki read-only period ends at: 2023-09-20 14:02:59.798838
  • 14:03 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 14:02 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:02 kamila@cumin1001: MediaWiki read-only period ends at: 2023-09-20 14:02:53.790615
  • 14:00 kamila@cumin1001: MediaWiki read-only period starts at: 2023-09-20 14:00:32.114116
  • 14:00 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 13:57 kamila@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
  • 13:57 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:57 kamila@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
  • 13:57 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:56 kamila@deploy2002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover: MediaWiki - T346474
  • 13:56 urbanecm@deploy2002: Finished scap: Backport for build: Update eslint-config-wikimedia to 0.25.1 (T346629), Change CSS selector for Minerva mobile menu icon (T346459) (duration: 34m 21s)
  • 13:56 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 13:50 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 13:50 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
  • 13:50 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
  • 13:49 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 13:49 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 13:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm-test1001.wikimedia.org with OS bookworm
  • 13:43 urbanecm@deploy2002: urbanecm and jdlrobson: Continuing with sync
  • 13:42 urbanecm@deploy2002: urbanecm and jdlrobson: Backport for build: Update eslint-config-wikimedia to 0.25.1 (T346629), Change CSS selector for Minerva mobile menu icon (T346459) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
  • 13:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
  • 13:21 urbanecm@deploy2002: Started scap: Backport for build: Update eslint-config-wikimedia to 0.25.1 (T346629), Change CSS selector for Minerva mobile menu icon (T346459)
  • 13:12 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm-test1001.wikimedia.org with OS bookworm
  • 13:02 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 00m 27s)
  • 13:02 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
  • 12:54 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 02m 10s)
  • 12:52 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
  • 12:52 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 04m 43s)
  • 12:47 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
  • 12:45 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 04m 34s)
  • 12:41 akosiaris: T346354 deploy RESTBase after bug is fixed
  • 12:40 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
  • 11:56 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:56 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:49 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:49 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org
  • 11:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb1004.wikimedia.org
  • 11:20 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:20 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:17 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) openstack.eqiad1.wikimediacloud.org on all recursors
  • 11:17 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache openstack.eqiad1.wikimediacloud.org on all recursors
  • 11:14 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:14 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack.eqiad1 - aborrero@cumin1001"
  • 11:13 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack.eqiad1 - aborrero@cumin1001"
  • 11:11 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
  • 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
  • 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
  • 10:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2004.codfw.wmnet with OS bullseye
  • 10:04 brouberol@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 10:03 klausman: RUnning authdns-update to activate change 957689 (T341696)
  • 10:02 klausman: Merging change 957689 (T341696) to lower DNS TTL to 5m for ORES name.
  • 10:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 10:00 Emperor: ms-be10[61-75] swift package updates T346730
  • 09:56 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
  • 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1005.eqiad.wmnet with OS bullseye
  • 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
  • 09:54 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
  • 09:48 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on kafka-jumbo1003.eqiad.wmnet with reason: investigation by brouberol and elukey about kafka ACL issues that might be fixed by a broker restart
  • 09:48 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on kafka-jumbo1003.eqiad.wmnet with reason: investigation by brouberol and elukey about kafka ACL issues that might be fixed by a broker restart
  • 09:41 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner2004.codfw.wmnet with OS bullseye
  • 09:39 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
  • 09:38 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
  • 09:35 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
  • 09:34 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
  • 09:34 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:34 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:33 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:32 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:29 klausman: Draining ml-serve1008 for kubelet partition increase (T339231)
  • 09:24 klausman: Draining ml-serve1007 for kubelet partition increase (T339231)
  • 09:15 klausman: Draining ml-serve1006 for kubelet partition increase (T339231)
  • 09:13 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
  • 09:09 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
  • 09:08 fabfur: applied patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/957292 (T344175) to add new mobile redirect domains to Varnish. Changes will be applied automatically by puppet on all cp hosts
  • 09:06 klausman: Draining ml-serve1005 for kubelet partition increase (T339231)
  • 09:00 godog: restore benthos@webrequest_live running on both centrallog hosts - T346871
  • 08:57 klausman: Draining ml-serve1004 for kubelet partition increase (T339231)
  • 08:47 klausman: Draining ml-serve1003 for kubelet partition increase (T339231)
  • 08:47 godog: temp bump threads to 15 for benthos@webrequest_live on centrallog2002 - T346871
  • 08:40 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.eqiad.wmnet with OS bullseye
  • 08:40 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices1005.eqiad.wmnet with OS bullseye
  • 08:40 klausman: Draining ml-serve1002 for kubelet partition increase (T339231)
  • 08:36 godog: stop benthos@webrequest_live.service on centrallog1002 to test redudancy/capacity - T346871
  • 08:33 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.eqiad.wmnet with OS bullseye
  • 08:32 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:31 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 08:31 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices1005
  • 08:31 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1005
  • 08:30 aborrero@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudservices1005
  • 08:30 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1005
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 08:20 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
  • 08:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
  • 08:10 moritzm: restarting FPM on mw* to pick up libwebp security updates
  • 08:02 moritzm: installing libwebp security updates on buster
  • 07:42 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm1001.wikimedia.org with OS bookworm
  • 07:41 taavi@deploy2002: Finished scap: Backport for Set READ_NEW for Wikitech on OATHAuth multiple devices migration (T242031), Set WRITE_NEW for OATHAuth multiple devices on fishbowls/privates (T242031) (duration: 36m 09s)
  • 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
  • 07:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
  • 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
  • 07:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
  • 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-web1001.eqiad.wmnet
  • 07:28 taavi@deploy2002: taavi: Continuing with sync
  • 07:26 taavi@deploy2002: taavi: Backport for Set READ_NEW for Wikitech on OATHAuth multiple devices migration (T242031), Set WRITE_NEW for OATHAuth multiple devices on fishbowls/privates (T242031) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental X
  • 07:24 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm1001.wikimedia.org with reason: host reimage
  • 07:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-web1001.eqiad.wmnet
  • 07:22 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm1001.wikimedia.org with reason: host reimage
  • 07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
  • 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
  • 07:09 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm1001.wikimedia.org with OS bookworm
  • 07:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
  • 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
  • 07:05 taavi@deploy2002: Started scap: Backport for Set READ_NEW for Wikitech on OATHAuth multiple devices migration (T242031), Set WRITE_NEW for OATHAuth multiple devices on fishbowls/privates (T242031)
  • 07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
  • 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1009.eqiad.wmnet
  • 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1009.eqiad.wmnet
  • 06:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
  • 06:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
  • 06:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
  • 06:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
  • 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1002.eqiad.wmnet with OS bullseye
  • 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
  • 00:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
  • 00:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pki1002.eqiad.wmnet with OS bullseye
  • 00:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1232']
  • 00:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1233']
  • 00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1231']
  • 00:02 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1229']
  • 00:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1233']
  • 00:01 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1230']
  • 00:01 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1228']
  • 00:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1232']
  • 00:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1231']
  • 00:00 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1227']
  • 00:00 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1226']

2023-09-19

  • 23:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1230']
  • 23:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1229']
  • 23:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1228']
  • 23:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1227']
  • 23:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1226']
  • 23:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:30 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:29 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 23:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1232.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1233.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1231.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1233.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1232.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1231.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:57 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1228.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:57 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1230.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1227.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1226.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:51 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@b603e64]: (no justification provided) (duration: 00m 05s)
  • 22:51 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@b603e64]: (no justification provided)
  • 22:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:50 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.27 refs T345888
  • 21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1227.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1228.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1230.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1226.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:48 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1232
  • 21:47 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1232
  • 21:46 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:45 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 21:41 brennen: train 1.41.0-wmf.27 (T345888): blockers resolved; rolling to group0
  • 21:37 brennen@deploy2002: Finished scap: Backport for Disable client preferences by default (T345363) (duration: 40m 45s)
  • 21:37 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1232
  • 21:37 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1232
  • 21:36 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1233
  • 21:36 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1231
  • 21:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1233
  • 21:34 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1232
  • 21:34 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1232
  • 21:34 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1231
  • 21:33 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:32 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1230
  • 21:32 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1229
  • 21:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:32 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:31 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1230
  • 21:31 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1226
  • 21:31 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1229
  • 21:31 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1227
  • 21:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1227
  • 21:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1226
  • 21:29 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:29 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[26-33] - jclark@cumin1001"
  • 21:29 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[26-33] - jclark@cumin1001"
  • 21:26 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 21:25 brennen@deploy2002: jdlrobson and brennen: Continuing with sync
  • 21:20 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1010']
  • 21:17 brennen@deploy2002: jdlrobson and brennen: Backport for Disable client preferences by default (T345363) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
  • 21:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010']
  • 21:11 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1009']
  • 21:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
  • 21:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1007']
  • 21:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009']
  • 21:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007']
  • 20:57 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pki1002']
  • 20:57 brennen@deploy2002: Started scap: Backport for Disable client preferences by default (T345363)
  • 20:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002']
  • 20:55 brennen@deploy2002: Finished scap: Backport for Fixes cannot read properties of undefined (T342277) (duration: 37m 39s)
  • 20:51 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@b603e64]: (no justification provided) (duration: 00m 05s)
  • 20:51 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@b603e64]: (no justification provided)
  • 20:50 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@b603e64]: (no justification provided) (duration: 00m 09s)
  • 20:50 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@b603e64]: (no justification provided)
  • 20:42 brennen@deploy2002: jdlrobson and brennen: Continuing with sync
  • 20:38 brennen@deploy2002: jdlrobson and brennen: Backport for Fixes cannot read properties of undefined (T342277) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:37 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:36 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010
  • 20:36 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010
  • 20:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:35 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007-10 - jclark@cumin1001"
  • 20:34 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007-10 - jclark@cumin1001"
  • 20:32 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 20:18 brennen@deploy2002: Started scap: Backport for Fixes cannot read properties of undefined (T342277)
  • 19:48 brennen@deploy2002: Finished scap: Backport for Revert "ResourceLoader: Set 'virtualFilePath' for startup.js" (T346800) (duration: 40m 46s)
  • 19:31 brennen@deploy2002: jforrester and brennen: Continuing with sync
  • 19:29 brennen@deploy2002: jforrester and brennen: Backport for Revert "ResourceLoader: Set 'virtualFilePath' for startup.js" (T346800) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 19:24 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:21 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1015
  • 19:20 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1015
  • 19:07 brennen@deploy2002: Started scap: Backport for Revert "ResourceLoader: Set 'virtualFilePath' for startup.js" (T346800)
  • 16:28 claime: Deployed https://gerrit.wikimedia.org/r/953344 - T345204
  • 16:04 kamila_: DC Switchover: traffic - T346330
  • 15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 15:58 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 15:58 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 15:57 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 15:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 15:57 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 15:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 15:57 cgoubert@deploy2002: Finished scap: (no justification provided) (duration: 03m 12s)
  • 15:56 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
  • 15:56 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/similar-users: apply
  • 15:56 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/similar-users: apply
  • 15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/similar-users: apply
  • 15:55 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 15:54 akosiaris: scaling down mobileapps, wikifeeds, mathoid, similar-users
  • 15:54 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 15:53 cgoubert@deploy2002: Started scap: (no justification provided)
  • 15:53 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 15:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 15:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 15:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 15:51 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 15:51 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:51 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:46 cgoubert@deploy2002: Finished scap: (no justification provided) (duration: 40m 44s)
  • 15:45 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:28 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:26 claime: running puppet on 'A:cp-text and P{P:trafficserver::backend}' - T346330
  • 15:25 claime: reduce mw-on-k8s traffic to 3% waiting on new nodes - T346330
  • 15:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:06 cgoubert@deploy2002: Started scap: (no justification provided)
  • 15:05 kamila@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330 (duration: 34m 46s)
  • 15:02 akosiaris: increase thumbor's pods in codfw to 48 to harmonize with eqiad
  • 15:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:02 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:56 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010
  • 14:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010
  • 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009
  • 14:51 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
  • 14:50 moritzm: installing python-werkzeug security updates
  • 14:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009
  • 14:49 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
  • 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1007
  • 14:48 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1007
  • 14:46 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1026-56} - jclark@cumin1001"
  • 14:45 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1026-56} - jclark@cumin1001"
  • 14:42 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:36 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-rw,name=codfw
  • 14:36 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-rw,name=eqiad
  • 14:33 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro
  • 14:33 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift
  • 14:32 kamila_: Switch deployment server - T346330
  • 14:30 kamila@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330
  • 14:28 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in eqiad: Datacenter Switchover: Services - T346330
  • 14:28 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thumbor
  • 14:25 oblivian@deploy1002: Finished scap: (no justification provided) (duration: 05m 44s)
  • 14:20 oblivian@deploy1002: Started scap: (no justification provided)
  • 14:20 kamila@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330 (duration: 19m 27s)
  • 14:01 kamila@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in eqiad: Datacenter Switchover: Services - T346330
  • 14:00 kamila@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330
  • 13:58 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki shwiki --fix` T346588
  • 13:57 samtar@deploy1002: Finished scap: Backport for Add namespace aliases to shwiki (T346588) (duration: 51m 50s)
  • 13:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:53 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-test-client1001.eqiad.wmnet
  • 13:52 elukey: clean old puppet certs kafka_logging-{eqiad,codfw}_broker from the Puppet CA and from Puppet private - T300130
  • 13:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 13:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating DNS record of kuberbetes2026 - jhancock@cumin2002"
  • 13:51 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 13:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating DNS record of kuberbetes2026 - jhancock@cumin2002"
  • 13:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 13:47 jebe@deploy1002: Finished deploy [airflow-dags/analytics@6b9855a]: (no justification provided) (duration: 00m 43s)
  • 13:46 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-test-client1001.eqiad.wmnet
  • 13:46 jebe@deploy1002: Started deploy [airflow-dags/analytics@6b9855a]: (no justification provided)
  • 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
  • 13:33 samtar@deploy1002: samtar and aleksandar: Continuing with sync
  • 13:28 samtar@deploy1002: samtar and aleksandar: Backport for Add namespace aliases to shwiki (T346588) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
  • 13:17 jebe@deploy1002: Finished deploy [analytics/refinery@2d9d6d0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2d9d6d0] (duration: 02m 06s)
  • 13:15 Emperor: ms-be10[44-60] swift package updates T346730
  • 13:15 jebe@deploy1002: Started deploy [analytics/refinery@2d9d6d0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2d9d6d0]
  • 13:14 jebe@deploy1002: Finished deploy [analytics/refinery@2d9d6d0] (thin): Regular analytics weekly train THIN [analytics/refinery@2d9d6d0] (duration: 00m 04s)
  • 13:14 jebe@deploy1002: Started deploy [analytics/refinery@2d9d6d0] (thin): Regular analytics weekly train THIN [analytics/refinery@2d9d6d0]
  • 13:14 jebe@deploy1002: Finished deploy [analytics/refinery@2d9d6d0]: Regular analytics weekly train [analytics/refinery@2d9d6d0] (duration: 05m 52s)
  • 13:08 jebe@deploy1002: Started deploy [analytics/refinery@2d9d6d0]: Regular analytics weekly train [analytics/refinery@2d9d6d0]
  • 13:05 samtar@deploy1002: Started scap: Backport for Add namespace aliases to shwiki (T346588)
  • 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 12:44 Emperor: ms-be20[60-73] swift package updates T346730
  • 12:22 Emperor: ms-be20[49-59] swift package updates T346730
  • 12:19 jebe@deploy1002: Finished deploy [analytics/refinery@91bb4a0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@91bb4a0] (duration: 02m 03s)
  • 12:18 Emperor: ms-be2048 swift package updates T346730
  • 12:17 jebe@deploy1002: Started deploy [analytics/refinery@91bb4a0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@91bb4a0]
  • 12:17 jebe@deploy1002: Finished deploy [analytics/refinery@91bb4a0] (thin): Regular analytics weekly train THIN [analytics/refinery@91bb4a0] (duration: 00m 05s)
  • 12:17 jebe@deploy1002: Started deploy [analytics/refinery@91bb4a0] (thin): Regular analytics weekly train THIN [analytics/refinery@91bb4a0]
  • 12:14 Emperor: ms-be2047 swift package updates T346730
  • 12:12 Emperor: ms-be204{5,6} swift package updates T346730
  • 12:10 jebe@deploy1002: Finished deploy [analytics/refinery@91bb4a0]: Regular analytics weekly train [analytics/refinery@91bb4a0] (duration: 06m 53s)
  • 12:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:03 jebe@deploy1002: Started deploy [analytics/refinery@91bb4a0]: Regular analytics weekly train [analytics/refinery@91bb4a0]
  • 11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 11:51 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 11:48 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52530 and previous config saved to /var/cache/conftool/dbconfig/20230919-112156-root.json
  • 11:09 Emperor: eqiad swift front-end swift package updates T346730
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52529 and previous config saved to /var/cache/conftool/dbconfig/20230919-110651-root.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52528 and previous config saved to /var/cache/conftool/dbconfig/20230919-105147-root.json
  • 10:38 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1148.eqiad.wmnet with OS bullseye
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52527 and previous config saved to /var/cache/conftool/dbconfig/20230919-103642-root.json
  • 10:34 Emperor: codfw swift front-end swift package updates T346730
  • 10:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1147.eqiad.wmnet with OS bullseye
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52526 and previous config saved to /var/cache/conftool/dbconfig/20230919-102137-root.json
  • 10:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
  • 10:11 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52525 and previous config saved to /var/cache/conftool/dbconfig/20230919-100632-root.json
  • 10:01 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
  • 09:58 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1148.eqiad.wmnet with OS bullseye
  • 09:56 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 3%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52524 and previous config saved to /var/cache/conftool/dbconfig/20230919-095127-root.json
  • 09:48 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm2001.wikimedia.org with OS bookworm
  • 09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1147.eqiad.wmnet with OS bullseye
  • 09:40 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52523 and previous config saved to /var/cache/conftool/dbconfig/20230919-093622-root.json
  • 09:12 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
  • 09:08 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
  • 09:03 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
  • 08:59 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
  • 08:47 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
  • 08:44 godog: bounce benthos@webrequest_live to clear out old metrics
  • 08:43 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
  • 08:41 godog: remove MediaWiki.*.growthexperiments.taskcount.link_recommendation.* from graphite - T346371
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 08:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1146.eqiad.wmnet with OS bullseye
  • 08:34 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
  • 08:30 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 08:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm2001.wikimedia.org with reason: host reimage
  • 08:26 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:26 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm2001.wikimedia.org with reason: host reimage
  • 08:26 brouberol: redeploying mw-page-content-change-enrich in codfw T336041
  • 08:26 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:25 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:25 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
  • 08:25 brouberol: redeploying mw-page-content-change-enrich in eqiad T336041
  • 08:24 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 08:24 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 08:24 brouberol: redeploying eventstreams-internal in eqiad T336041
  • 08:23 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 08:23 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 08:23 brouberol: redeploying eventstreams-internal in codfw T336041
  • 08:22 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:21 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:21 brouberol: redeploying eventstream-analytics-external in codfw T336041
  • 08:21 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:20 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:20 brouberol: redeploying eventstream-analytics-external in eqiad T336041
  • 08:19 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 08:18 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 08:18 brouberol: redeploying eventstream-analytics in codfw T336041
  • 08:18 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 08:17 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 08:13 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
  • 08:11 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm2001.wikimedia.org with OS bookworm
  • 08:10 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
  • 08:05 brouberol@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 08:05 brouberol@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 08:05 moritzm: restarting FPM on mw canaries to pick up libwebp updates
  • 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
  • 08:02 brouberol@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 08:02 brouberol@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 08:00 brouberol@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 07:59 brouberol@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 07:56 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS bullseye
  • 07:51 moritzm: installing libwebp security updates on buster
  • 07:51 moritzm: installing libwep security updates on buster
  • 07:43 kartik@deploy1002: Finished scap: Backport for Disable Special:Contribute on bnwiki (T345772) (duration: 38m 49s)
  • 07:27 kartik@deploy1002: kartik: Continuing with sync
  • 07:26 kartik@deploy1002: kartik: Backport for Disable Special:Contribute on bnwiki (T345772) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:11 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 07:04 kartik@deploy1002: Started scap: Backport for Disable Special:Contribute on bnwiki (T345772)
  • 06:35 denisse: updating PCC facts
  • 06:09 XioNoX: push new pfw policy - T346705
  • 05:48 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2004-dev.codfw.wmnet with OS bookworm
  • 05:46 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P52522 and previous config saved to /var/cache/conftool/dbconfig/20230919-054539-root.json
  • 04:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 04:06 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.25 (duration: 02m 10s)
  • 04:03 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.27 refs T345888 (duration: 61m 05s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.27 refs T345888
  • 00:56 eileen: civicrm upgraded from 0a36997d to f0e9d3f6

2023-09-18

  • 22:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs1004.eqiad.wmnet
  • 22:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 22:07 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 21:59 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 21:51 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs1004.eqiad.wmnet
  • 21:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs1003.eqiad.wmnet
  • 21:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 21:45 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 21:40 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 21:19 maryum: Deployed patch for T344359
  • 21:13 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs1003.eqiad.wmnet
  • 20:49 cjming: end of UTC late backport window
  • 20:36 cjming@deploy1002: Finished scap: Backport for Link recommendations: prevent too large offsets in cirrus queries (T345713) (duration: 11m 40s)
  • 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1008.eqiad.wmnet with OS bullseye
  • 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:29 cjming@deploy1002: urbanecm and cjming: Continuing with sync
  • 20:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1009.eqiad.wmnet with OS bullseye
  • 20:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:26 cjming@deploy1002: urbanecm and cjming: Backport for Link recommendations: prevent too large offsets in cirrus queries (T345713) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:24 cjming@deploy1002: Started scap: Backport for Link recommendations: prevent too large offsets in cirrus queries (T345713)
  • 20:24 cjming@deploy1002: Finished scap: Backport for clienthints: Enable purging of data on all wikis (T257893) (duration: 09m 24s)
  • 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:16 cjming@deploy1002: cjming and dreamyjazz: Continuing with sync
  • 20:16 cjming@deploy1002: cjming and dreamyjazz: Backport for clienthints: Enable purging of data on all wikis (T257893) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1008.eqiad.wmnet with reason: host reimage
  • 20:15 cjming@deploy1002: Started scap: Backport for clienthints: Enable purging of data on all wikis (T257893)
  • 20:13 cjming@deploy1002: Finished scap: Backport for clienthints: Pin wgCheckUserDisplayClientHints to false (T337942) (duration: 08m 18s)
  • 20:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1008.eqiad.wmnet with reason: host reimage
  • 20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: host reimage
  • 20:06 cjming@deploy1002: cjming and dreamyjazz: Continuing with sync
  • 20:06 cjming@deploy1002: cjming and dreamyjazz: Backport for clienthints: Pin wgCheckUserDisplayClientHints to false (T337942) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1009.eqiad.wmnet with reason: host reimage
  • 20:05 cjming@deploy1002: Started scap: Backport for clienthints: Pin wgCheckUserDisplayClientHints to false (T337942)
  • 19:46 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 19:43 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 19:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host dbstore1009.eqiad.wmnet with OS bullseye
  • 19:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host dbstore1008.eqiad.wmnet with OS bullseye
  • 18:02 ejegg: re-enabled donor thank you mail send jobs
  • 17:50 ejegg: civicrm upgraded from 0c2853aa to 0a36997d
  • 17:48 ejegg: disabled donor thank you mail send jobs for Civi update
  • 16:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1145.eqiad.wmnet with OS bullseye
  • 16:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbstore1009']
  • 16:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbstore1008']
  • 16:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1144.eqiad.wmnet with OS bullseye
  • 16:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbstore1009']
  • 16:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbstore1009']
  • 16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbstore1009']
  • 16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbstore1008']
  • 16:17 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
  • 16:15 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
  • 16:14 jnuche@deploy1002: Installation of scap version "4.61.1" completed for 601 hosts
  • 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:12 jnuche@deploy1002: Installing scap version "4.61.1" for 601 hosts
  • 16:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:03 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS bullseye
  • 16:01 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
  • 15:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1047.eqiad.wmnet with OS bullseye
  • 15:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:57 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
  • 15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
  • 15:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
  • 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:53 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 31s)
  • 15:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 15:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:44 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 45s)
  • 15:43 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS bullseye
  • 15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
  • 15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
  • 15:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
  • 15:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
  • 15:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1036
  • 15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
  • 15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 15:28 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1036
  • 15:27 Emperor: install new swift packages on ms-be2044
  • 15:26 Emperor: repool ms-fe2009 with new swift packages
  • 15:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1143.eqiad.wmnet with OS bullseye
  • 15:18 Emperor: depool ms-fe2009 to install new swift packages
  • 15:13 Emperor: upload swift_2.26.0-10+deb11u1+wmf1_amd64.changes to apt1001
  • 15:11 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1142.eqiad.wmnet with OS bullseye
  • 15:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
  • 15:01 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
  • 14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 14:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 14:47 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS bullseye
  • 14:45 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1142.eqiad.wmnet with reason: host reimage
  • 14:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 14:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 14:42 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1142.eqiad.wmnet with reason: host reimage
  • 14:41 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
  • 14:38 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
  • 14:32 jelto: use certmanager instead of certgen in miscweb namespace - T300033
  • 14:29 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 14:29 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS bullseye
  • 14:26 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 14:24 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 14:21 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 14:20 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices2004-dev.codfw.wmnet with OS bookworm
  • 14:18 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm
  • 14:15 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
  • 14:04 bblack: lvs1020, lvs1018: restarting pybal to re-enable healthchecks for wikireplicas ( T337446 -> https://gerrit.wikimedia.org/r/924508 )
  • 14:01 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
  • 14:01 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
  • 14:01 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
  • 13:57 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
  • 13:56 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
  • 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
  • 13:47 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
  • 13:46 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm
  • 13:38 godog: force-set max-repeaters to 20 for cr2-eqsin and cr3-eqsin - T346606
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 13:24 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 13:16 taavi@deploy1002: Finished scap: Backport for Disable UploadWizard CTA for MachineVision (T345187) (duration: 11m 16s)
  • 13:11 vgutierrez: depool cp4052 for bookworm testing - T342154
  • 13:09 taavi@deploy1002: taavi and cparle: Continuing with sync
  • 13:06 taavi@deploy1002: taavi and cparle: Backport for Disable UploadWizard CTA for MachineVision (T345187) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:04 taavi@deploy1002: Started scap: Backport for Disable UploadWizard CTA for MachineVision (T345187)
  • 13:04 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:04 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:04 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:03 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:02 godog: set max-repeaters to 30 for cr3-eqsin in librenms - T346606
  • 13:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:48 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 12:47 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1141.eqiad.wmnet with OS bullseye
  • 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 12:32 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
  • 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: furud.codfw.wmnet
  • 12:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: furud.codfw.wmnet
  • 12:24 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1141.eqiad.wmnet with reason: host reimage
  • 12:24 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1140.eqiad.wmnet with OS bullseye
  • 12:23 moritzm: installing libwebp security updates on bullseye
  • 12:21 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1141.eqiad.wmnet with reason: host reimage
  • 12:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1140.eqiad.wmnet with reason: host reimage
  • 12:08 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1141.eqiad.wmnet with OS bullseye
  • 12:07 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1140.eqiad.wmnet with reason: host reimage
  • 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on A:maps-replica-eqiad
  • 11:53 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1140.eqiad.wmnet with OS bullseye
  • 11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudservices1005.wikimedia.org
  • 11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1005 - aborrero@cumin1001"
  • 11:45 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1005 - aborrero@cumin1001"
  • 11:44 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:44 jayme: removed cergen certs from the list of trusted service account token signers on all kubernetes clusters - T329826
  • 11:43 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:37 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1005.wikimedia.org
  • 11:14 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on A:maps-replica-eqiad
  • 11:13 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on P{maps201[0].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
  • 11:05 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on P{maps201[0].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
  • 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on P{maps200[7,8].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
  • 10:48 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 10:46 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on P{maps200[7,8].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
  • 10:44 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 10:44 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on P{maps200[5,6].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
  • 10:40 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 10:33 godog: set max-repeaters to 20 for cr3-eqsin using "force save" - T346606
  • 10:28 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on P{maps200[5,6].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
  • 09:59 elukey: remove ores-cache stream from changeprop (side effects - higher ORES client latencies, no mediawiki.revision-score event stream published) - https://phabricator.wikimedia.org/T342116
  • 09:56 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 09:54 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 09:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 09:50 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 09:50 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 09:50 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 09:50 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 09:50 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 09:49 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 09:49 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 09:49 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 09:46 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 09:46 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 09:44 fabfur: enabled puppet on cp4050 for T346602
  • 09:43 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 09:42 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:40 fabfur: disabled puppet on cp4050 for T346602
  • 09:39 fabfur: enabled puppet on cp4052 for T346602
  • 09:38 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:34 hashar@deploy1002: Finished scap: Backport for tests: Do not assume UTSysop exists (T346253) (duration: 09m 06s)
  • 09:32 fabfur: disabled puppet on cp4052 for T346602
  • 09:28 godog: set max-repeaters to 20 for cr3-eqsin in librenms - T346606
  • 09:28 godog: set max-repeaters for cr3-eqsin in librenms - T346606
  • 09:27 hashar@deploy1002: hashar and urbanecm: Continuing with sync
  • 09:26 hashar@deploy1002: hashar and urbanecm: Backport for tests: Do not assume UTSysop exists (T346253) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:25 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:25 hashar@deploy1002: Started scap: Backport for tests: Do not assume UTSysop exists (T346253)
  • 09:25 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:06 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 09:05 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:03 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:02 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:02 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 08:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:47 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 08:46 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 07:58 Amir1: running db checksum run in s3 eqiad replicas (T207253)
  • 07:26 taavi@deploy1002: Finished scap: Backport for robots.txt: Disable indexing user (talk) pages and draft (talk) pages on shwiki (T346589) (duration: 22m 24s)
  • 07:17 taavi@deploy1002: aleksandar and taavi: Continuing with sync
  • 07:15 moritzm: installing clamav security updates
  • 07:13 taavi@deploy1002: aleksandar and taavi: Backport for robots.txt: Disable indexing user (talk) pages and draft (talk) pages on shwiki (T346589) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:03 taavi@deploy1002: Started scap: Backport for robots.txt: Disable indexing user (talk) pages and draft (talk) pages on shwiki (T346589)

2023-09-16

  • 13:52 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 13:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:52 akosiaris: re-enable changeprop
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 12:57 akosiaris: stop changeprop in eqiad
  • 01:44 krinkle@deploy1002: Finished deploy [integration/docroot@9a1fb37]: (no justification provided) (duration: 00m 06s)
  • 01:44 krinkle@deploy1002: Started deploy [integration/docroot@9a1fb37]: (no justification provided)

2023-09-15

  • 21:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1047.eqiad.wmnet with OS bullseye
  • 21:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
  • 20:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 20:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 20:59 tzatziki: removing 6 files for legal compliance
  • 20:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 20:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 20:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 20:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 17:56 urandom: stopping Cassandra bootstrap, restbase1030-a — T331713
  • 17:43 urandom: initiate Cassandra bootstrap, restbase1030-a — T331713
  • 17:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2002-dev.codfw.wmnet with OS bookworm
  • 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bookworm
  • 16:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 16:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
  • 16:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
  • 16:32 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
  • 16:32 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
  • 16:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2003-dev.codfw.wmnet with OS bookworm
  • 16:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bookworm
  • 16:09 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 15:51 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:51 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 15:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 15:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 15:50 claime: raising mw-api-int replicas to 12+2 to cope with wdqs backfill
  • 15:43 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 15:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 15:42 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 15:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:42 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
  • 15:42 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 15:41 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
  • 15:39 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
  • 15:32 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
  • 15:30 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:24 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[18,25-27,33].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 15:03 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:03 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:57 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:38 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:38 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:35 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[18,25-27,33].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 14:35 urandom: rolling Cassandra restart, RESTBase/eqiad/row-D — T331713
  • 14:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
  • 14:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
  • 14:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 14:27 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2006-dev
  • 14:27 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2006-dev
  • 14:26 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2005-dev
  • 14:26 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2005-dev
  • 14:25 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2004-dev
  • 14:24 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2004-dev
  • 14:06 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2444.codfw.wmnet
  • 14:05 claime: repooling mw2444.codfw.wmnet - T345884
  • 13:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
  • 13:47 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
  • 13:46 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
  • 13:41 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
  • 13:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm-test1001.wikimedia.org with OS bookworm
  • 13:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 13:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 13:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 13:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 13:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 13:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 13:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 13:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
  • 13:19 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
  • 13:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
  • 13:16 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
  • 13:03 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm-test1001.wikimedia.org with OS bookworm
  • 13:01 akosiaris@deploy1002: Synchronized docroot: (no justification provided) (duration: 08m 20s)
  • 12:50 topranks: changing ECMP hasing algorithm on drmrs, esams and cloud switches T339852
  • 12:27 topranks: changing ECMP hasing algorithm on asw1-b12-drmrs T339852
  • 11:54 _joe_: updated etcd-mirror to 0.0.10 everywhere
  • 11:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1138.eqiad.wmnet with OS bullseye
  • 11:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1138.eqiad.wmnet with reason: host reimage
  • 11:09 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1138.eqiad.wmnet with reason: host reimage
  • 10:56 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1138.eqiad.wmnet with OS bullseye
  • 10:07 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:07 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts in codfw - aborrero@cumin1001"
  • 09:22 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts in codfw - aborrero@cumin1001"
  • 09:20 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 09:10 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 08:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica2008.wikimedia.org
  • 08:57 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ldap-replica2008.wikimedia.org with OS bookworm
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica2008.wikimedia.org with reason: host reimage
  • 08:47 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica2008.wikimedia.org with reason: host reimage
  • 08:46 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:40 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:39 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:39 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 08:39 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica2008.wikimedia.org with OS bookworm
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
  • 08:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica2008.wikimedia.org on all recursors
  • 08:26 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica2008.wikimedia.org on all recursors
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
  • 08:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
  • 08:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica2008.wikimedia.org
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2007.wikimedia.org
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-replica2007.wikimedia.org with OS bookworm
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica2007.wikimedia.org with reason: host reimage
  • 07:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica2007.wikimedia.org with reason: host reimage
  • 07:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica2007.wikimedia.org with OS bookworm
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
  • 07:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
  • 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica2007.wikimedia.org on all recursors
  • 07:25 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica2007.wikimedia.org on all recursors
  • 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
  • 07:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
  • 07:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:22 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica2007.wikimedia.org
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 07:21 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
  • 07:04 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 07:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 06:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
  • 06:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
  • 06:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
  • 06:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
  • 05:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
  • 05:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5004.wikimedia.org
  • 05:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5004.wikimedia.org
  • 02:43 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[17,22-24,29,32].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 01:44 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17,22-24,29,32].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 01:44 urandom: rolling Cassandra restart, RESTBase/eqiad/row-B — T331713
  • 01:20 krinkle@deploy1002: Finished scap: Backport for Remove old origin-with-crossorigin referrer policy (T338183) (duration: 08m 16s)
  • 01:14 krinkle@deploy1002: krinkle and hartman: Continuing with sync
  • 01:13 krinkle@deploy1002: krinkle and hartman: Backport for Remove old origin-with-crossorigin referrer policy (T338183) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 01:12 krinkle@deploy1002: Started scap: Backport for Remove old origin-with-crossorigin referrer policy (T338183)
  • 01:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
  • 01:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
  • 01:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 00:12 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[16,19-21,28,31].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001

2023-09-14

  • 23:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1056.eqiad.wmnet with OS bullseye
  • 23:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
  • 23:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 23:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 23:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 23:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 23:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 23:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
  • 23:24 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 23:19 eileen: civicrm upgraded from 9d34ed9b to 0c2853aa - big vendor update - roll back if issues
  • 23:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:13 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[16,19-21,28,31].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 23:12 urandom: rolling Cassandra restart, RESTBase/eqiad/row-A — T331713
  • 23:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
  • 23:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1056.eqiad.wmnet with OS bullseye
  • 23:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
  • 23:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 23:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1031.eqiad.wmnet with OS bullseye
  • 23:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:47 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1030.eqiad.wmnet with OS bullseye
  • 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1034.eqiad.wmnet with OS bullseye
  • 22:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:44 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2001-dev.codfw.wmnet with OS bookworm
  • 22:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
  • 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
  • 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
  • 22:21 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[12,17-18,23,26-27].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
  • 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
  • 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
  • 22:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2001-dev.codfw.wmnet with reason: host reimage
  • 22:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2001-dev.codfw.wmnet with reason: host reimage
  • 22:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
  • 22:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1031.eqiad.wmnet with OS bullseye
  • 22:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1030.eqiad.wmnet with OS bullseye
  • 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1032.eqiad.wmnet with OS bullseye
  • 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bookworm
  • 21:50 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 21:42 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1035.eqiad.wmnet with OS bullseye
  • 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1039.eqiad.wmnet with OS bullseye
  • 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1037.eqiad.wmnet with OS bullseye
  • 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
  • 21:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1034.eqiad.wmnet with OS bullseye
  • 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
  • 21:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
  • 21:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1034.eqiad.wmnet with OS bullseye
  • 21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
  • 21:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 21:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1034.eqiad.wmnet with OS bullseye
  • 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1033.eqiad.wmnet with OS bullseye
  • 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:27 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[12,17-18,23,26-27].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 21:26 urandom: rolling Cassandra restart, RESTBase/row-D — T331713
  • 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
  • 21:24 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 21:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
  • 21:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
  • 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
  • 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
  • 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
  • 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
  • 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
  • 21:13 ryankemper: T345475 Beginning process to bring 3 new hosts `wdqs202[3-5]` into service. Merged https://gerrit.wikimedia.org/r/957802 and running puppet on hosts
  • 21:06 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1039.eqiad.wmnet with OS bullseye
  • 21:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
  • 21:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1037.eqiad.wmnet with OS bullseye
  • 21:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
  • 21:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1035.eqiad.wmnet with OS bullseye
  • 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
  • 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1033.eqiad.wmnet with OS bullseye
  • 20:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1032.eqiad.wmnet with OS bullseye
  • 20:47 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 20:45 thcipriani@deploy1002: Finished scap: Backport for Don't offer visual diffs for non-wikitext pages (T346252), ThreadItemStore: Add details to row insertion exceptions (T343859) (duration: 12m 35s)
  • 20:38 thcipriani@deploy1002: thcipriani and matmarex: Continuing with sync
  • 20:34 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 20:34 thcipriani@deploy1002: thcipriani and matmarex: Backport for Don't offer visual diffs for non-wikitext pages (T346252), ThreadItemStore: Add details to row insertion exceptions (T343859) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD
  • 20:32 thcipriani@deploy1002: Started scap: Backport for Don't offer visual diffs for non-wikitext pages (T346252), ThreadItemStore: Add details to row insertion exceptions (T343859)
  • 20:20 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 20:20 urandom: rolling Cassandra restart, RESTBase/row-C — T331713
  • 20:05 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[13-14,19,21,24].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 19:20 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13-14,19,21,24].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
  • 19:20 urandom: rolling Cassandra restart, RESTBase/row-B — T331713
  • 19:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1050.eqiad.wmnet with OS bullseye
  • 19:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
  • 19:10 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1051.eqiad.wmnet with OS bullseye
  • 18:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
  • 18:58 urandom: initiating `removenode`, ID=627fe8e9-d298-43b3-a1a2-7c8a3f01370b (restbase1030-c) — T331713
  • 18:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
  • 18:54 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
  • 18:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
  • 18:52 urandom: stopping bootstrap of restbase1030-c — T331713
  • 18:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
  • 18:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
  • 18:45 urandom: retrying Cassandra bootstrap of restbase1030-c — T331713
  • 18:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1050.eqiad.wmnet with OS bullseye
  • 18:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1050.eqiad.wmnet with OS bullseye
  • 18:38 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:35 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
  • 18:35 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1051.eqiad.wmnet with OS bullseye
  • 18:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1051.eqiad.wmnet with OS bullseye
  • 18:34 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:27 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@7160e27]: Deploy latest DAGs to analytics Airflow instance T340861 (duration: 00m 40s)
  • 18:27 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@7160e27]: Deploy latest DAGs to analytics Airflow instance T340861
  • 18:24 bblack: cp107[56],cp202[78],cp600[19]: (one host from each cluster, at 3 sites): restarting varnish-frontend spaced out over the next ~hour for memory tweaks.
  • 18:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1053.eqiad.wmnet with OS bullseye
  • 18:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1046.eqiad.wmnet with OS bullseye
  • 18:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
  • 18:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
  • 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1045.eqiad.wmnet with OS bullseye
  • 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
  • 17:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1053.eqiad.wmnet with OS bullseye
  • 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
  • 17:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1046.eqiad.wmnet with OS bullseye
  • 17:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1048.eqiad.wmnet with OS bullseye
  • 17:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
  • 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1049.eqiad.wmnet with OS bullseye
  • 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
  • 17:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
  • 17:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1044.eqiad.wmnet with OS bullseye
  • 17:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1043.eqiad.wmnet with OS bullseye
  • 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1046.eqiad.wmnet with OS bullseye
  • 17:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
  • 17:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
  • 17:20 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
  • 17:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
  • 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
  • 17:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
  • 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
  • 17:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
  • 17:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
  • 17:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
  • 17:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on search-loader2002.codfw.wmnet,search-loader1002.eqiad.wmnet with reason: T346039
  • 17:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on search-loader2002.codfw.wmnet,search-loader1002.eqiad.wmnet with reason: T346039
  • 17:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1049.eqiad.wmnet with OS bullseye
  • 17:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1048.eqiad.wmnet with OS bullseye
  • 17:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
  • 17:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
  • 17:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
  • 17:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1046.eqiad.wmnet with OS bullseye
  • 17:00 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1045.eqiad.wmnet with OS bullseye
  • 16:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1044.eqiad.wmnet with OS bullseye
  • 16:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1043.eqiad.wmnet with OS bullseye
  • 16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1040.eqiad.wmnet with OS bullseye
  • 16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1041.eqiad.wmnet with OS bullseye
  • 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1042.eqiad.wmnet with OS bullseye
  • 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:47 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
  • 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
  • 16:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
  • 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
  • 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1040.eqiad.wmnet with OS bullseye
  • 16:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
  • 16:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
  • 16:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1040.eqiad.wmnet with OS bullseye
  • 16:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1056.eqiad.wmnet with OS bullseye
  • 16:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:21 denisse: Failing over from netmon2002 (codfw) to netmon1003 (eqiad).
  • 16:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:17 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update - volans@cumin1001"
  • 16:17 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update - volans@cumin1001"
  • 16:16 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
  • 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1042.eqiad.wmnet with OS bullseye
  • 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1041.eqiad.wmnet with OS bullseye
  • 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1040.eqiad.wmnet with OS bullseye
  • 16:13 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
  • 16:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1053.eqiad.wmnet with OS bullseye
  • 16:12 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
  • 16:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
  • 16:04 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
  • 16:04 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "failed in reimage script said manually run it - robh@cumin1001 - T342533"
  • 16:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1030.eqiad.wmnet with OS bullseye
  • 16:03 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "failed in reimage script said manually run it - robh@cumin1001 - T342533"
  • 16:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1031.eqiad.wmnet with OS bullseye
  • 16:03 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
  • 16:03 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
  • 16:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:01 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
  • 16:00 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1056.eqiad.wmnet with OS bullseye
  • 15:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1055.eqiad.wmnet with OS bullseye
  • 15:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:55 urbanecm@deploy1002: Finished scap: Backport for listTaskCounts: Push total task counts to statsd for all tasks (T345204), linkTaskCounts: Stop producing per-topic statsd data (T345210) (duration: 07m 37s)
  • 15:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1054.eqiad.wmnet with OS bullseye
  • 15:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1056.eqiad.wmnet with OS bullseye
  • 15:54 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:54 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:53 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1139.eqiad.wmnet with OS bullseye
  • 15:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2006.codfw.wmnet
  • 15:52 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2006.codfw.wmnet
  • 15:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
  • 15:52 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
  • 15:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2005.codfw.wmnet
  • 15:51 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2005.codfw.wmnet
  • 15:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bullseye
  • 15:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
  • 15:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1051.eqiad.wmnet with OS bullseye
  • 15:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
  • 15:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1050.eqiad.wmnet with OS bullseye
  • 15:48 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host search-loader1002.eqiad.wmnet
  • 15:47 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host search-loader1002.eqiad.wmnet with OS bullseye
  • 15:47 urbanecm@deploy1002: Started scap: Backport for listTaskCounts: Push total task counts to statsd for all tasks (T345204), linkTaskCounts: Stop producing per-topic statsd data (T345210)
  • 15:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
  • 15:44 jayme: restarting primary lvs in codfw, eqsin, ulsfo
  • 15:42 jayme: restarting secondary lvs in codfw, eqsin, ulsfo
  • 15:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1054.eqiad.wmnet with reason: host reimage
  • 15:37 jayme: running puppet on lvs[2011-2014].codfw.wmnet,lvs[5004-5006].eqsin.wmnet,lvs[4008-4010].ulsfo.wmnet
  • 15:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1055.eqiad.wmnet with reason: host reimage
  • 15:36 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host search-loader2002.codfw.wmnet
  • 15:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host search-loader2002.codfw.wmnet with OS bullseye
  • 15:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
  • 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1002.eqiad.wmnet
  • 15:01 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 15:01 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM search-loader1002.eqiad.wmnet - bking@cumin1001"
  • 15:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf2005.codfw.wmnet with OS bullseye
  • 14:58 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:58 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) search-loader1002.eqiad.wmnet on all recursors
  • 14:58 bking@cumin1001: START - Cookbook sre.dns.wipe-cache search-loader1002.eqiad.wmnet on all recursors
  • 14:58 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader1002.eqiad.wmnet - bking@cumin1001"
  • 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2002.codfw.wmnet
  • 14:55 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader1002.eqiad.wmnet - bking@cumin1001"
  • 14:55 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:55 bking@cumin1001: START - Cookbook sre.hosts.reimage for host search-loader2002.codfw.wmnet with OS bullseye
  • 14:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2002.codfw.wmnet
  • 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet
  • 14:52 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM search-loader2002.codfw.wmnet - bking@cumin1001"
  • 14:52 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM search-loader2002.codfw.wmnet - bking@cumin1001"
  • 14:51 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) search-loader2002.codfw.wmnet on all recursors
  • 14:51 bking@cumin1001: START - Cookbook sre.dns.wipe-cache search-loader2002.codfw.wmnet on all recursors
  • 14:51 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:51 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader2002.codfw.wmnet - bking@cumin1001"
  • 14:51 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.wikimedia.org with reason: test before full decom
  • 14:51 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.wikimedia.org with reason: test before full decom
  • 14:50 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:50 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host search-loader1002.eqiad.wmnet
  • 14:50 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader2002.codfw.wmnet - bking@cumin1001"
  • 14:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet
  • 14:47 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:47 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host search-loader2002.codfw.wmnet
  • 14:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1029.eqiad.wmnet with reason: host reimage
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2004.codfw.wmnet
  • 14:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1028.eqiad.wmnet with reason: host reimage
  • 14:43 vgutierrez: varnish: decrease max_connections to 10k per backend server globally
  • 14:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2004.codfw.wmnet
  • 14:41 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1029.eqiad.wmnet with reason: host reimage
  • 14:41 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1028.eqiad.wmnet with reason: host reimage
  • 14:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1027.eqiad.wmnet with reason: host reimage
  • 14:37 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1027.eqiad.wmnet with reason: host reimage
  • 14:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf2005.codfw.wmnet with reason: host reimage
  • 14:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2005.codfw.wmnet with reason: host reimage
  • 14:32 moritzm: installing qemu security updates on ganeti-test cluster
  • 14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1031.eqiad.wmnet with OS bullseye
  • 14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1030.eqiad.wmnet with OS bullseye
  • 14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1029.eqiad.wmnet with OS bullseye
  • 14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1028.eqiad.wmnet with OS bullseye
  • 14:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1027.eqiad.wmnet with OS bullseye
  • 14:19 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 14:18 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 14:18 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 14:18 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host conf2005.codfw.wmnet with OS bullseye
  • 14:17 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 14:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 14:16 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 13:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf2006.codfw.wmnet with OS bullseye
  • 13:57 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1138.eqiad.wmnet with OS bullseye
  • 13:56 filippo@deploy1002: Finished deploy [librenms/librenms@f049593]: (no justification provided) (duration: 00m 11s)
  • 13:55 filippo@deploy1002: Started deploy [librenms/librenms@f049593]: (no justification provided)
  • 13:39 godog: issue test alertmanager librenms alert - T346318
  • 13:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf2006.codfw.wmnet with reason: host reimage
  • 13:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2006.codfw.wmnet with reason: host reimage
  • 13:32 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
  • 13:31 moritzm: installing libwebp security updates on bookworm
  • 13:28 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1139.eqiad.wmnet with reason: host reimage
  • 13:28 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
  • 13:25 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1139.eqiad.wmnet with reason: host reimage
  • 13:19 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host conf2006.codfw.wmnet with OS bullseye
  • 13:14 moritzm: installing aom security updates
  • 13:13 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
  • 13:13 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
  • 13:12 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1139.eqiad.wmnet with OS bullseye
  • 13:10 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1138.eqiad.wmnet with OS bullseye
  • 12:56 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm-test1001.wikimedia.org with OS bookworm
  • 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 12:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync
  • 12:16 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: sync
  • 12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 12:11 hnowlan@cumin1001: END (FAIL) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=1) rolling restart_daemons on A:restbase-canary
  • 12:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
  • 12:06 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
  • 12:03 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf2004.codfw.wmnet with OS bullseye
  • 12:01 hnowlan@cumin1001: START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-canary
  • 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
  • 11:54 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm-test1001.wikimedia.org with OS bookworm
  • 11:49 hnowlan@deploy1002: Finished deploy [restbase/deploy@8eb62f2]: Revert "Disable wikifeeds announcements healthcheck" (duration: 06m 12s)
  • 11:43 hnowlan@deploy1002: Started deploy [restbase/deploy@8eb62f2]: Revert "Disable wikifeeds announcements healthcheck"
  • 11:37 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
  • 11:35 hnowlan@deploy1002: Finished deploy [restbase/deploy@e8a6ae4]: Disable wikifeeds announcements healthcheck (duration: 10m 08s)
  • 11:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on idm-test1001.wikimedia.org with reason: upgrade to Bookwork
  • 11:34 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on idm-test1001.wikimedia.org with reason: upgrade to Bookwork
  • 11:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3003.esams.wmnet
  • 11:25 hnowlan@deploy1002: Started deploy [restbase/deploy@e8a6ae4]: Disable wikifeeds announcements healthcheck
  • 11:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3003.esams.wmnet
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
  • 11:21 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
  • 11:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
  • 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 11:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 11:12 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
  • 11:08 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1137.eqiad.wmnet with OS bullseye
  • 11:04 brouberol: brouberol@cumin2002 START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch - T344798
  • 11:02 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
  • 10:43 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1137.eqiad.wmnet with reason: host reimage
  • 10:41 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1137.eqiad.wmnet with reason: host reimage
  • 10:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf2004.codfw.wmnet with reason: host reimage
  • 10:27 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1137.eqiad.wmnet with OS bullseye
  • 10:25 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2004.codfw.wmnet with reason: host reimage
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
  • 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
  • 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-eqiad
  • 10:10 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host conf2004.codfw.wmnet with OS bullseye
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1006.wikimedia.org
  • 10:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
  • 10:06 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: sync
  • 10:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1006.wikimedia.org
  • 10:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1005.wikimedia.org
  • 09:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1005.wikimedia.org
  • 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
  • 09:52 elukey: remove the 'mediawiki.revision-score' stream form eventstreams public API - T342116
  • 09:51 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync
  • 09:51 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: sync
  • 09:50 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: sync
  • 09:49 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: sync
  • 09:49 jayme: restarted navtiming on webperf2003 to pick up changed etcd service records
  • 09:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
  • 09:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
  • 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
  • 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
  • 09:22 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 09:17 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 09:16 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 09:10 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 09:07 moritzm: installing qemu security updates on ganeti-test
  • 08:59 btullis: running build-production-images on build2001 for T344910
  • 08:53 godog: +50 to prometheus eqiad k8s-staging
  • 08:45 jayme: restarting confd fleet wide
  • 08:45 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-eqiad
  • 08:43 jayme: restarting primary lvs in codfw, eqsin, ulsfo
  • 08:38 jayme: restarted secondary lvs in codfw, eqsin, ulsfo
  • 08:24 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.26 refs T343728
  • 07:57 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.eqsin.wmnet on all recursors
  • 07:56 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.eqsin.wmnet on all recursors
  • 07:56 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.ulsfo.wmnet on all recursors
  • 07:56 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.ulsfo.wmnet on all recursors
  • 07:56 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.codfw.wmnet on all recursors
  • 07:56 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.codfw.wmnet on all recursors
  • 07:44 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host debmonitor2003.codfw.wmnet
  • 07:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
  • 07:32 hashar: Backport & config deployment window completed.
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
  • 07:13 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service on MediaWiki - rollout #4 (T341445) (duration: 10m 17s)
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1001.wikimedia.org
  • 07:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt1001.wikimedia.org
  • 07:06 kartik@deploy1002: abi and kartik: Continuing with sync
  • 07:04 kartik@deploy1002: abi and kartik: Backport for Enable MinT translation service on MediaWiki - rollout #4 (T341445) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:02 kartik@deploy1002: Started scap: Backport for Enable MinT translation service on MediaWiki - rollout #4 (T341445)
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Pre swichover tasks
  • 06:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Pre swichover tasks
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Pre swichover tasks
  • 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Pre swichover tasks
  • 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Pre swichover tasks
  • 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Pre swichover tasks
  • 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Pre swichover tasks
  • 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Pre swichover tasks
  • 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Pre swichover tasks
  • 05:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Pre swichover tasks
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2013.codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Pre swichover tasks
  • 05:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2013.codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Pre swichover tasks
  • 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Pre swichover tasks
  • 05:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Pre swichover tasks
  • 05:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc[2011,2014].codfw.wmnet,pc1011.eqiad.wmnet with reason: Pre swichover tasks
  • 05:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc[2011,2014].codfw.wmnet,pc1011.eqiad.wmnet with reason: Pre swichover tasks
  • 03:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 03:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 03:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 02:58 rzl@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 02:58 rzl@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 02:57 rzl@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 02:56 rzl@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 02:54 rzl@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 01:36 urandom: starting RESTBase/Cassandra node rebuilds, cassandra-c/row D — T331713

2023-09-13

  • 23:06 urandom: starting Cassandra node rebuilds, restbase/row D — T331713
  • 22:57 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
  • 21:50 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1128.eqiad.wmnet with reason: HW issues
  • 21:50 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1128.eqiad.wmnet with reason: HW issues
  • 21:50 denisse: downtiming db1128
  • 21:49 denisse@cumin1001: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P52504 and previous config saved to /var/cache/conftool/dbconfig/20230913-214930-denisse.json
  • 21:48 denisse: depooling db1128
  • 21:35 bking@deploy1002: Finished deploy [wdqs/wdqs@3e0a913]: 0.3.129 use allowlist T344284 (duration: 11m 27s)
  • 21:28 eileen: civicrm upgraded from 6b247288 to 9d34ed9b
  • 21:24 bking@deploy1002: Started deploy [wdqs/wdqs@3e0a913]: 0.3.129 use allowlist T344284
  • 21:22 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: 0.3.129 use allowlist T344284 (duration: 00m 59s)
  • 21:21 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: 0.3.129 use allowlist T344284
  • 19:44 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon1003.wikimedia.org with OS bookworm
  • 19:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 19:40 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 19:34 eileen: civicrm upgraded from 80aee570 to 6b247288
  • 19:24 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
  • 19:21 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
  • 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 19:09 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bookworm
  • 19:09 urandom: initiating rebuild of restbase1027-a & restbase1033-a
  • 19:08 urandom: initiating rebuild of restbase1026-a
  • 19:00 urandom: initiating rebuild of restbase1025-a
  • 18:51 urandom: initiating rebuild of restbase1018-a
  • 18:49 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4052.ulsfo.wmnet with OS bookworm
  • 18:42 urandom: stopping bootstrap of restbase1030-c — T331713
  • 18:38 godog: run schema migrations for librenms on m1 (backdated, started ~1h ago)
  • 18:33 urandom: restarting restbase service (restbase1031) — T331713
  • 18:19 urandom: resuming bootstrap of restbase1030-c —
  • 18:05 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 17:45 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 17:42 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
  • 17:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
  • 17:22 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bookworm
  • 16:34 denisse@deploy1002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.8.2 - T344136 (duration: 00m 16s)
  • 16:34 denisse@deploy1002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.8.2 - T344136
  • 16:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on vrts1002.eqiad.wmnet with reason: Testing
  • 16:04 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on vrts1002.eqiad.wmnet with reason: Testing
  • 16:04 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bookworm
  • 15:41 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
  • 15:38 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
  • 15:34 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:34 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 15:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:26 jayme: re-enabled puppet on all k8s control planes
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-codfw
  • 15:19 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bookworm
  • 15:19 denisse: Start reimage of netmon2002
  • 15:17 denisse: Starting LibreNMS upgrade in codfw.
  • 15:14 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 15:04 jayme: stopped puppet on all k8s control planes for 956842 rollout
  • 15:01 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 15:01 hnowlan: repooling cp2037 and enabling puppet on A:cp
  • 14:56 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 14:55 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 14:52 hnowlan: disable puppet on A:cp
  • 14:51 hnowlan: depooled service=ats-be,name=cp2037.codfw.wmnet
  • 14:51 jayme: updated kubernetes-* packages fleet wide to 1.23.14-3 - T329826
  • 14:50 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 14:41 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 14:39 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP towards puppetised nftables firewall
  • 14:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP towards puppetised nftables firewall
  • 14:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 14:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:25 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:17 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:17 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:10 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:10 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:08 hnowlan: stopping cassandra on restbase1030-c
  • 13:52 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-codfw
  • 13:34 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset (T318272) (duration: 15m 42s)
  • 13:27 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and d3r1ck01: Continuing with sync
  • 13:20 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and d3r1ck01: Backport for rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset (T318272) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:18 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset (T318272)
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52499 and previous config saved to /var/cache/conftool/dbconfig/20230913-122323-ladsgroup.json
  • 12:17 godog: pool only titan hosts for thanos-web and thanos-query services - T341488
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52498 and previous config saved to /var/cache/conftool/dbconfig/20230913-120818-ladsgroup.json
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52497 and previous config saved to /var/cache/conftool/dbconfig/20230913-115314-ladsgroup.json
  • 11:30 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 11:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:18 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52495 and previous config saved to /var/cache/conftool/dbconfig/20230913-111834-arnaudb.json
  • 11:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:15 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:15 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2002.codfw.wmnet with OS bookworm
  • 10:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
  • 10:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
  • 10:49 jayme: imported kubernetes_1.23.14-3 to bullseye-wikimedia component/kubernetes123 - T329826
  • 10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:40 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1002.eqiad.wmnet with OS bookworm
  • 10:34 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
  • 10:34 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host titan2002.codfw.wmnet with OS bookworm
  • 10:29 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 10:28 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:28 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:27 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
  • 10:26 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host titan2002.codfw.wmnet with OS bookworm
  • 10:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
  • 10:21 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
  • 10:11 claime: set/pooled=no; selector: name=mw2444.codfw.wmnet - T345884
  • 10:10 cgoubert@cumin1001: conftool action : set/pooled=no; selector: name=mw2444.codfw.wmnet
  • 10:10 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 10:06 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
  • 10:06 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
  • 10:06 aklapper@deploy1002: Finished scap: Backport for Revert "EntityId: Hard-deprecate Serializable methods" (duration: 08m 49s)
  • 10:06 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1002.eqiad.wmnet with OS bookworm
  • 10:06 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan2002.codfw.wmnet with OS bookworm
  • 09:59 aklapper@deploy1002: aklapper and jnuche: Continuing with sync
  • 09:59 aklapper@deploy1002: aklapper and jnuche: Backport for Revert "EntityId: Hard-deprecate Serializable methods" synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:57 aklapper@deploy1002: Started scap: Backport for Revert "EntityId: Hard-deprecate Serializable methods"
  • 09:51 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:48 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 09:35 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 09:35 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 09:34 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:34 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 09:16 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
  • 09:14 aklapper@deploy1002: backport Cancelled
  • 09:14 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
  • 09:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2001.codfw.wmnet with OS bookworm
  • 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 08:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1001.eqiad.wmnet with OS bookworm
  • 08:46 claime: Running puppet on cp-text P:trafficserver::backend - T290536
  • 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
  • 08:30 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
  • 08:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
  • 08:25 aklapper@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.26 refs T343728 (duration: 06m 46s)
  • 08:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
  • 08:18 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.26 refs T343728
  • 08:14 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2001.codfw.wmnet with OS bookworm
  • 08:14 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host titan2001.codfw.wmnet with OS bookworm
  • 08:08 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
  • 08:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1001.eqiad.wmnet with OS bookworm
  • 08:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 07:56 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 07:54 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
  • 07:53 vgutierrez: repool cp1075 && cp1076
  • 07:51 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1001.eqiad.wmnet with OS bookworm
  • 07:51 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan2001.codfw.wmnet,service=thanos-web
  • 07:46 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2001.codfw.wmnet with OS bookworm
  • 07:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52491 and previous config saved to /var/cache/conftool/dbconfig/20230913-074602-arnaudb.json
  • 07:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 07:44 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.codfw.wmnet,service=thanos-web
  • 07:43 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.codfdw.wmnet,service=thanos-web
  • 07:43 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan2001.codfdw.wmnet,service=thanos-web
  • 07:43 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1001.eqiad.wmnet,service=thanos-web
  • 07:43 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe1004.eqiad.wmnet,service=thanos-web
  • 07:42 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 07:39 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
  • 06:06 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Running again following connection refused errors from kubemaster (duration: 07m 24s)
  • 05:55 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable source maps on group0 wikis attempt 2 (duration: 07m 37s)
  • 05:40 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable source maps on group0 wikis T47514 (duration: 07m 14s)
  • 05:15 tstarling@deploy1002: Synchronized wmf-config/etcd.php: Remove PHP 7.2 fallback for array_key_first g 956364 (duration: 07m 03s)
  • 04:35 hmonroy@deploy1002: Finished scap: Backport for Do not enable entire OOUI in PHP on page load (T345414) (duration: 07m 58s)
  • 04:29 hmonroy@deploy1002: hmonroy and jdlrobson: Continuing with sync
  • 04:29 hmonroy@deploy1002: hmonroy and jdlrobson: Backport for Do not enable entire OOUI in PHP on page load (T345414) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 04:27 hmonroy@deploy1002: Started scap: Backport for Do not enable entire OOUI in PHP on page load (T345414)
  • 04:26 hmonroy@deploy1002: Finished scap: Backport for Do not enable entire OOUI in PHP on page load (T345414) (duration: 09m 56s)
  • 04:19 hmonroy@deploy1002: hmonroy and jdlrobson: Continuing with sync
  • 04:17 hmonroy@deploy1002: hmonroy and jdlrobson: Backport for Do not enable entire OOUI in PHP on page load (T345414) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 04:16 hmonroy@deploy1002: Started scap: Backport for Do not enable entire OOUI in PHP on page load (T345414)

2023-09-12

  • 23:14 brett: Upload trafficserver_9.2.1-1wm2_amd64 to bookworm-wikimedia
  • 23:09 eileen: config revision changed from 2efd8142 to eb7931ca add is_create_activities to bounce fetch job
  • 21:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 21:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 21:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52486 and previous config saved to /var/cache/conftool/dbconfig/20230912-211128-arnaudb.json
  • 21:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1001.eqiad.wmnet
  • 21:04 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader1001.eqiad.wmnet
  • 20:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P52485 and previous config saved to /var/cache/conftool/dbconfig/20230912-205621-arnaudb.json
  • 20:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2001.codfw.wmnet
  • 20:43 cjming: end of UTC late backport window
  • 20:43 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader2001.codfw.wmnet
  • 20:42 inflatador: rebooting search-loader2001.codfw.wmnet T344671
  • 20:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P52484 and previous config saved to /var/cache/conftool/dbconfig/20230912-204115-arnaudb.json
  • 20:39 cjming@deploy1002: Finished scap: Backport for Make the new stream name consistent with convention (duration: 09m 24s)
  • 20:33 cjming@deploy1002: sharvaniharan and cjming: Continuing with sync
  • 20:31 cjming@deploy1002: sharvaniharan and cjming: Backport for Make the new stream name consistent with convention synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:30 cjming@deploy1002: Started scap: Backport for Make the new stream name consistent with convention
  • 20:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52483 and previous config saved to /var/cache/conftool/dbconfig/20230912-202609-arnaudb.json
  • 20:25 cjming@deploy1002: Finished scap: Backport for Reduce initial payload of Phonos styles (T345414) (duration: 12m 06s)
  • 20:22 eileen: civicrm upgraded from 5b7b2b3e to 80aee570
  • 20:19 cjming@deploy1002: cjming and samtar: Continuing with sync
  • 20:15 cjming@deploy1002: cjming and samtar: Backport for Reduce initial payload of Phonos styles (T345414) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:13 cjming@deploy1002: Started scap: Backport for Reduce initial payload of Phonos styles (T345414)
  • 19:43 eileen: civicrm upgraded from 771fcde3 to 5b7b2b3e
  • 19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ssw1 old irb int dns - cmooney@cumin1001"
  • 19:28 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ssw1 old irb int dns - cmooney@cumin1001"
  • 19:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:17 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:15 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:50 sukhe: run authdns-update to remove nsa.wikimedia.org
  • 16:28 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2005-dev.codfw.wmnet with OS bookworm
  • 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1004.eqiad.wmnet
  • 15:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1004.eqiad.wmnet
  • 15:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2003.codfw.wmnet
  • 15:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2003.codfw.wmnet
  • 15:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 15:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 15:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 15:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1056.eqiad.wmnet']
  • 15:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 15:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1055.eqiad.wmnet']
  • 15:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1056.eqiad.wmnet']
  • 15:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1055.eqiad.wmnet']
  • 14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1051.eqiad.wmnet']
  • 14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1050.eqiad.wmnet']
  • 14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1054.eqiad.wmnet']
  • 14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1053.eqiad.wmnet']
  • 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1052.eqiad.wmnet']
  • 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1049.eqiad.wmnet']
  • 14:57 godog: add 30G to prometheus@services and 300G to prometheus@ops (codfw)
  • 14:57 dancy@deploy1002: Installation of scap version "4.61.0" completed for 595 hosts
  • 14:56 dancy@deploy1002: Installing scap version "4.61.0" for 595 hosts
  • 14:55 dancy@deploy1002: Installing scap version "4.61.0" for 596 hosts
  • 14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1051.eqiad.wmnet']
  • 14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1050.eqiad.wmnet']
  • 14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1052.eqiad.wmnet']
  • 14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1053.eqiad.wmnet']
  • 14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1054.eqiad.wmnet']
  • 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1044.eqiad.wmnet']
  • 14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1049.eqiad.wmnet']
  • 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1046.eqiad.wmnet']
  • 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1047.eqiad.wmnet']
  • 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1043.eqiad.wmnet']
  • 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1048.eqiad.wmnet']
  • 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1045.eqiad.wmnet']
  • 14:46 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
  • 14:42 moritzm: installing Linux 6.1.52 on Bookworm hosts
  • 14:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1048.eqiad.wmnet']
  • 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1047.eqiad.wmnet']
  • 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1046.eqiad.wmnet']
  • 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1045.eqiad.wmnet']
  • 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1044.eqiad.wmnet']
  • 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1043.eqiad.wmnet']
  • 14:39 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
  • 14:39 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1042.eqiad.wmnet']
  • 14:39 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1041.eqiad.wmnet']
  • 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: furud.codfw.wmnet
  • 14:38 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: furud.codfw.wmnet
  • 14:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1040.eqiad.wmnet']
  • 14:35 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1039.eqiad.wmnet']
  • 14:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1038.eqiad.wmnet']
  • 14:33 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1037.eqiad.wmnet']
  • 14:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1042.eqiad.wmnet']
  • 14:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1041.eqiad.wmnet']
  • 14:30 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
  • 14:30 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
  • 14:30 moritzm: installing libssh2 security updates#
  • 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1040.eqiad.wmnet']
  • 14:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1035.eqiad.wmnet']
  • 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1039.eqiad.wmnet']
  • 14:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1034.eqiad.wmnet']
  • 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1038.eqiad.wmnet']
  • 14:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1037.eqiad.wmnet']
  • 14:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
  • 14:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1033.eqiad.wmnet']
  • 14:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
  • 14:23 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1032.eqiad.wmnet']
  • 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
  • 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
  • 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1031.eqiad.wmnet']
  • 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1035.eqiad.wmnet']
  • 14:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1034.eqiad.wmnet']
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1028.eqiad.wmnet']
  • 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1030.eqiad.wmnet']
  • 14:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1033.eqiad.wmnet']
  • 14:15 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1027.eqiad.wmnet']
  • 14:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1032.eqiad.wmnet']
  • 14:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1031.eqiad.wmnet']
  • 14:10 sukhe: enable puppet on dns-rec to progessively roll out nsa->ns2 updates
  • 14:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1030.eqiad.wmnet']
  • 14:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
  • 14:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
  • 14:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1028.eqiad.wmnet']
  • 14:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1027.eqiad.wmnet']
  • 14:02 sukhe: [correction] enable puppet on dns6001 to test nsa removal
  • 14:02 sukhe: enable puppet on doh6001 to test nsa removal
  • 14:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:57 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:56 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:50 sukhe: disable puppet on A:dns-rec
  • 13:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:46 TheresNoTime: UTC afternoon backport window closed
  • 13:45 samtar@deploy1002: Finished scap: Backport for Reduce initial payload of Phonos styles (T345414) (duration: 08m 59s)
  • 13:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343198)', diff saved to https://phabricator.wikimedia.org/P52477 and previous config saved to /var/cache/conftool/dbconfig/20230912-134451-arnaudb.json
  • 13:40 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 13:39 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 13:39 samtar@deploy1002: samtar: Continuing with sync
  • 13:38 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 13:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 13:38 samtar@deploy1002: samtar: Backport for Reduce initial payload of Phonos styles (T345414) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:36 samtar@deploy1002: Started scap: Backport for Reduce initial payload of Phonos styles (T345414)
  • 13:36 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:35 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:31 taavi@deploy1002: Finished scap: Backport for Enable Parsoid support for Kartographer on enwiki (T342871) (duration: 26m 05s)
  • 13:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P52476 and previous config saved to /var/cache/conftool/dbconfig/20230912-132944-arnaudb.json
  • 13:19 taavi@deploy1002: ihurbain and taavi: Continuing with sync
  • 13:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P52475 and previous config saved to /var/cache/conftool/dbconfig/20230912-131438-arnaudb.json
  • 13:10 moritzm: installing grub2 updates from Bullseye point release
  • 13:06 taavi@deploy1002: ihurbain and taavi: Backport for Enable Parsoid support for Kartographer on enwiki (T342871) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:05 taavi@deploy1002: Started scap: Backport for Enable Parsoid support for Kartographer on enwiki (T342871)
  • 12:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343198)', diff saved to https://phabricator.wikimedia.org/P52474 and previous config saved to /var/cache/conftool/dbconfig/20230912-125932-arnaudb.json
  • 12:40 brouberol@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 12:24 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 12:15 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:15 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:15 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:14 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:12 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudservices1004.wikimedia.org
  • 12:11 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:11 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
  • 12:09 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
  • 12:07 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:59 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1004.wikimedia.org
  • 11:57 godog: pool thanos[12]001 for thanos.w.o - T341999
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52473 and previous config saved to /var/cache/conftool/dbconfig/20230912-114711-root.json
  • 11:43 godog: pool titan hosts alongside thanos-fe for thanos-query / thanos-web services - T341999
  • 11:42 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1001.eqiad.wmnet,service=thanos-web
  • 11:42 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1002.eqiad.wmnet,service=thanos-web
  • 11:41 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 7 hosts with reason: Mute initial failures of hadoop-hdfs-datanode.service
  • 11:41 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 7 hosts with reason: Mute initial failures of hadoop-hdfs-datanode.service
  • 11:40 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1002.eqiad.wmnet,service=thanos-web
  • 11:40 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=titan1002.eqiad.wmnet,service=thanos-web
  • 11:39 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1001.eqiad.wmnet,service=thanos-web
  • 11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan2002.codfw.wmnet
  • 11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan2001.codfw.wmnet
  • 11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan1002.eqiad.wmnet
  • 11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan1001.eqiad.wmnet
  • 11:36 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan*
  • 11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan2002.codfw.wmnet
  • 11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan2001.codfw.wmnet
  • 11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan1002.eqiad.wmnet
  • 11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan1001.eqiad.wmnet
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52472 and previous config saved to /var/cache/conftool/dbconfig/20230912-113207-root.json
  • 11:18 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudservices1004.wikimedia.org
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52471 and previous config saved to /var/cache/conftool/dbconfig/20230912-111702-root.json
  • 11:03 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:03 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52470 and previous config saved to /var/cache/conftool/dbconfig/20230912-110157-root.json
  • 10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
  • 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52468 and previous config saved to /var/cache/conftool/dbconfig/20230912-104652-root.json
  • 10:45 moritzm: rebalance Ganeti cluster in eqiad/C following node reboots
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
  • 10:37 taavi@cumin1001: conftool action : set/pooled=yes:weight=10; selector: cluster=cloudweb
  • 10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
  • 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52467 and previous config saved to /var/cache/conftool/dbconfig/20230912-103148-root.json
  • 10:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
  • 10:21 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 10:21 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 3%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52466 and previous config saved to /var/cache/conftool/dbconfig/20230912-101643-root.json
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.pki.restart-reboot (exit_code=0) rolling reboot on A:pki
  • 10:13 moritzm: disabled nginx/puppetdb/postgresql/microservice on puppetdb1002/2002 to ensure nothing hits the old endpoints anymore
  • 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
  • 10:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:08 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres on old nodes to ensure nothing hits them anyway
  • 10:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres on old nodes to ensure nothing hits them anyway
  • 10:02 hnowlan: enabling puppet on A:cp
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52465 and previous config saved to /var/cache/conftool/dbconfig/20230912-100138-root.json
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
  • 09:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
  • 09:52 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
  • 09:52 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
  • 09:32 hnowlan: disabled puppet on A:cp
  • 09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52464 and previous config saved to /var/cache/conftool/dbconfig/20230912-092639-arnaudb.json
  • 09:26 jmm@cumin2002: END (FAIL) - Cookbook sre.pki.restart-reboot (exit_code=99) rolling reboot on A:pki
  • 09:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52463 and previous config saved to /var/cache/conftool/dbconfig/20230912-092618-arnaudb.json
  • 09:26 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
  • 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 09:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P52461 and previous config saved to /var/cache/conftool/dbconfig/20230912-091112-arnaudb.json
  • 08:58 claime: Running puppet on cp-text P:trafficserver::backend - T341780
  • 08:58 claime: Sending 5% of global traffic to mw-on-k8s - T341780
  • 08:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P52460 and previous config saved to /var/cache/conftool/dbconfig/20230912-085606-arnaudb.json
  • 08:51 claime: mw-api-ext, mw-web: Raise total replicas to 14 - T341780
  • 08:51 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 08:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 08:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 08:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 08:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 08:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 08:50 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 08:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 08:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 08:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 08:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1028.eqiad.wmnet
  • 08:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52459 and previous config saved to /var/cache/conftool/dbconfig/20230912-084059-arnaudb.json
  • 08:39 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.26 refs T343728
  • 08:38 moritzm: rebalance Ganeti cluster in codfw/C following node replacement
  • 08:24 oblivian@deploy1002: Finished scap: Backport for Replace calls to wfHostname with clusterconfig ones (duration: 09m 16s)
  • 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 08:18 oblivian@deploy1002: oblivian: Continuing with sync
  • 08:17 oblivian@deploy1002: oblivian: Backport for Replace calls to wfHostname with clusterconfig ones synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:15 oblivian@deploy1002: Started scap: Backport for Replace calls to wfHostname with clusterconfig ones
  • 08:13 oblivian@deploy1002: Finished scap: Backport for ClusterConfig: also allow to return hostname, Enable PageNotice on enwiktionary beta (T61245) (duration: 45m 23s)
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
  • 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
  • 08:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
  • 07:58 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1156.eqiad.wmnet
  • 07:58 oblivian@deploy1002: tto and oblivian: Continuing with sync
  • 07:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
  • 07:56 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1156.eqiad.wmnet
  • 07:56 oblivian@deploy1002: tto and oblivian: Backport for ClusterConfig: also allow to return hostname, Enable PageNotice on enwiktionary beta (T61245) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:53 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1155.eqiad.wmnet
  • 07:51 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1155.eqiad.wmnet
  • 07:51 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1154.eqiad.wmnet
  • 07:49 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1154.eqiad.wmnet
  • 07:45 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1153.eqiad.wmnet
  • 07:43 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1153.eqiad.wmnet
  • 07:36 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org
  • 07:28 oblivian@deploy1002: Started scap: Backport for ClusterConfig: also allow to return hostname, Enable PageNotice on enwiktionary beta (T61245)
  • 07:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
  • 07:23 oblivian@deploy1002: Finished scap: Backport for update noc README, Use ClusterConfig (duration: 13m 46s)
  • 07:17 oblivian@deploy1002: oblivian: Continuing with sync
  • 07:11 oblivian@deploy1002: oblivian: Backport for update noc README, Use ClusterConfig synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:09 oblivian@deploy1002: Started scap: Backport for update noc README, Use ClusterConfig
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 06:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T343198)', diff saved to https://phabricator.wikimedia.org/P52456 and previous config saved to /var/cache/conftool/dbconfig/20230912-062353-arnaudb.json
  • 06:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 06:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 06:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343198)', diff saved to https://phabricator.wikimedia.org/P52455 and previous config saved to /var/cache/conftool/dbconfig/20230912-062332-arnaudb.json
  • 06:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P52454 and previous config saved to /var/cache/conftool/dbconfig/20230912-060825-arnaudb.json
  • 05:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P52453 and previous config saved to /var/cache/conftool/dbconfig/20230912-055319-arnaudb.json
  • 05:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2014.codfw.wmnet with OS bullseye
  • 05:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343198)', diff saved to https://phabricator.wikimedia.org/P52452 and previous config saved to /var/cache/conftool/dbconfig/20230912-053813-arnaudb.json
  • 05:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2014.codfw.wmnet with reason: host reimage
  • 05:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2014.codfw.wmnet with reason: host reimage
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1119 with Debian Bookworm in s1 with just 10% T339185', diff saved to https://phabricator.wikimedia.org/P52450 and previous config saved to /var/cache/conftool/dbconfig/20230912-051753-marostegui.json
  • 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2158', diff saved to https://phabricator.wikimedia.org/P52449 and previous config saved to /var/cache/conftool/dbconfig/20230912-051725-root.json
  • 05:11 moritzm: installing aom security updates
  • 05:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2014.codfw.wmnet with OS bullseye
  • 05:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
  • 05:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52448 and previous config saved to /var/cache/conftool/dbconfig/20230912-050033-arnaudb.json
  • 05:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 04:59 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 04:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52447 and previous config saved to /var/cache/conftool/dbconfig/20230912-045944-arnaudb.json
  • 04:56 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
  • 04:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P52446 and previous config saved to /var/cache/conftool/dbconfig/20230912-044437-arnaudb.json
  • 04:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P52445 and previous config saved to /var/cache/conftool/dbconfig/20230912-042931-arnaudb.json
  • 04:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52444 and previous config saved to /var/cache/conftool/dbconfig/20230912-041425-arnaudb.json
  • 03:58 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.23, 1.41.0-wmf.24 (duration: 02m 30s)
  • 03:56 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.26 refs T343728 (duration: 53m 18s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.26 refs T343728
  • 02:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1002.eqiad.wmnet with OS bookworm
  • 02:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:48 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
  • 02:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
  • 02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
  • 01:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1001.eqiad.wmnet with OS bookworm
  • 01:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 01:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
  • 01:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
  • 01:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
  • 00:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52443 and previous config saved to /var/cache/conftool/dbconfig/20230912-001715-arnaudb.json
  • 00:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 00:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 00:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52442 and previous config saved to /var/cache/conftool/dbconfig/20230912-001654-arnaudb.json
  • 00:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P52441 and previous config saved to /var/cache/conftool/dbconfig/20230912-000148-arnaudb.json

2023-09-11

  • 23:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P52440 and previous config saved to /var/cache/conftool/dbconfig/20230911-234641-arnaudb.json
  • 23:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52439 and previous config saved to /var/cache/conftool/dbconfig/20230911-233135-arnaudb.json
  • 23:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T343198)', diff saved to https://phabricator.wikimedia.org/P52438 and previous config saved to /var/cache/conftool/dbconfig/20230911-231131-arnaudb.json
  • 23:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 23:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 23:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343198)', diff saved to https://phabricator.wikimedia.org/P52437 and previous config saved to /var/cache/conftool/dbconfig/20230911-231054-arnaudb.json
  • 22:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P52436 and previous config saved to /var/cache/conftool/dbconfig/20230911-225548-arnaudb.json
  • 22:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1002.eqiad.wmnet with OS bookworm
  • 22:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1001.eqiad.wmnet with OS bookworm
  • 22:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P52435 and previous config saved to /var/cache/conftool/dbconfig/20230911-224042-arnaudb.json
  • 22:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343198)', diff saved to https://phabricator.wikimedia.org/P52434 and previous config saved to /var/cache/conftool/dbconfig/20230911-222536-arnaudb.json
  • 21:33 cwhite: update grafana to 9.4.14 on grafana1002 T345362
  • 21:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
  • 21:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
  • 21:19 sbassett: Deployed security fix for T345693
  • 20:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['titan1001.eqiad.wmnet']
  • 20:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan1001.eqiad.wmnet']
  • 20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['titan1001.eqiad.wmnet']
  • 20:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan1001.eqiad.wmnet']
  • 20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['titan1001.eqiad.wmnet']
  • 20:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan1001.eqiad.wmnet']
  • 20:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host titan1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host titan1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host titan1001
  • 20:17 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host titan1001
  • 20:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host titan1002
  • 20:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host titan1002
  • 20:13 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host titan1001
  • 20:13 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host titan1001
  • 20:13 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:13 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
  • 20:12 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
  • 20:10 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 20:10 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:09 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
  • 20:09 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
  • 20:05 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T337310)', diff saved to https://phabricator.wikimedia.org/P52432 and previous config saved to /var/cache/conftool/dbconfig/20230911-194332-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P52431 and previous config saved to /var/cache/conftool/dbconfig/20230911-192826-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P52430 and previous config saved to /var/cache/conftool/dbconfig/20230911-191320-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T337310)', diff saved to https://phabricator.wikimedia.org/P52429 and previous config saved to /var/cache/conftool/dbconfig/20230911-185813-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T337310)', diff saved to https://phabricator.wikimedia.org/P52428 and previous config saved to /var/cache/conftool/dbconfig/20230911-184231-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 18:33 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bullseye
  • 18:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 18:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 18:11 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
  • 18:08 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
  • 18:00 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1030.eqiad.wmnet with OS bullseye
  • 17:59 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 17:58 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
  • 17:53 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bullseye
  • 17:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 17:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T337310)', diff saved to https://phabricator.wikimedia.org/P52427 and previous config saved to /var/cache/conftool/dbconfig/20230911-174321-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P52426 and previous config saved to /var/cache/conftool/dbconfig/20230911-172815-ladsgroup.json
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P52425 and previous config saved to /var/cache/conftool/dbconfig/20230911-171309-ladsgroup.json
  • 17:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1030.eqiad.wmnet with reason: host reimage
  • 17:06 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1030.eqiad.wmnet with reason: host reimage
  • 16:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T337310)', diff saved to https://phabricator.wikimedia.org/P52424 and previous config saved to /var/cache/conftool/dbconfig/20230911-165802-ladsgroup.json
  • 16:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1055.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1056.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1054.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2096 (T337310)', diff saved to https://phabricator.wikimedia.org/P52423 and previous config saved to /var/cache/conftool/dbconfig/20230911-164249-ladsgroup.json
  • 16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2096.codfw.wmnet with reason: Maintenance
  • 16:41 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 16:32 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
  • 16:31 denisse@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netmon2002.wikimedia.org with OS bookworm
  • 16:28 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
  • 16:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1056.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1054.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1055.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:16 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 16:12 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1152.eqiad.wmnet
  • 16:10 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1152.eqiad.wmnet
  • 16:10 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 16:08 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2005-dev.codfw.wmnet with OS bookworm
  • 16:07 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1151.eqiad.wmnet
  • 16:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:06 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
  • 16:05 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1151.eqiad.wmnet
  • 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1050.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1052.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1051.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1049.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1046.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1053.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1048.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1045.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:03 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
  • 16:01 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1150.eqiad.wmnet
  • 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 16:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 15:59 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1150.eqiad.wmnet
  • 15:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1047.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:48 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:48 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1047 - jclark@cumin1001"
  • 15:47 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1047 - jclark@cumin1001"
  • 15:45 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1149.eqiad.wmnet
  • 15:43 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bookworm
  • 15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T343198)', diff saved to https://phabricator.wikimedia.org/P52421 and previous config saved to /var/cache/conftool/dbconfig/20230911-154327-arnaudb.json
  • 15:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:41 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1149.eqiad.wmnet
  • 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1048.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1046.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1045.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1050.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1049.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1053.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1051.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1052.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1043.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1044.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1041.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1042.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T337310)', diff saved to https://phabricator.wikimedia.org/P52420 and previous config saved to /var/cache/conftool/dbconfig/20230911-152456-ladsgroup.json
  • 15:23 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet2005-dev.codfw.wmnet with OS bookworm
  • 15:21 jnuche@deploy1002: Installation of scap version "4.60.0" completed for 595 hosts
  • 15:20 jnuche@deploy1002: Installing scap version "4.60.0" for 595 hosts
  • 15:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:18 jnuche@deploy1002: Installing scap version "4.60.0" for 595 hosts
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P52419 and previous config saved to /var/cache/conftool/dbconfig/20230911-150950-ladsgroup.json
  • 15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1044.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1043.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1042.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1041.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:04 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:03 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1031.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:56 brouberol@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1149.eqiad.wmnet
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P52418 and previous config saved to /var/cache/conftool/dbconfig/20230911-145443-ladsgroup.json
  • 14:54 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1149.eqiad.wmnet
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T337310)', diff saved to https://phabricator.wikimedia.org/P52417 and previous config saved to /var/cache/conftool/dbconfig/20230911-143937-ladsgroup.json
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1220 (T337310)', diff saved to https://phabricator.wikimedia.org/P52416 and previous config saved to /var/cache/conftool/dbconfig/20230911-143102-ladsgroup.json
  • 14:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1220.eqiad.wmnet with reason: Maintenance
  • 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1220.eqiad.wmnet with reason: Maintenance
  • 14:19 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2005-dev.codfw.wmnet with OS bookworm
  • 13:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2002.codfw.wmnet
  • 13:55 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf2002.codfw.wmnet
  • 13:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1002.eqiad.wmnet
  • 13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 13:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T337310)', diff saved to https://phabricator.wikimedia.org/P52414 and previous config saved to /var/cache/conftool/dbconfig/20230911-135520-ladsgroup.json
  • 13:49 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf1002.eqiad.wmnet
  • 13:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1001.eqiad.wmnet
  • 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf1001.eqiad.wmnet
  • 13:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P52413 and previous config saved to /var/cache/conftool/dbconfig/20230911-134013-ladsgroup.json
  • 13:40 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service in more wikis - rollout #3 (T341445) (duration: 11m 18s)
  • 13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
  • 13:33 kartik@deploy1002: kartik and abi: Continuing with sync
  • 13:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2001.codfw.wmnet
  • 13:30 kartik@deploy1002: kartik and abi: Backport for Enable MinT translation service in more wikis - rollout #3 (T341445) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:28 kartik@deploy1002: Started scap: Backport for Enable MinT translation service in more wikis - rollout #3 (T341445)
  • 13:26 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace"" (duration: 08m 04s)
  • 13:26 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf2001.codfw.wmnet
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P52412 and previous config saved to /var/cache/conftool/dbconfig/20230911-132507-ladsgroup.json
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P52411 and previous config saved to /var/cache/conftool/dbconfig/20230911-132210-root.json
  • 13:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas3001.wikimedia.org
  • 13:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas3001.wikimedia.org - ayounsi@cumin1001"
  • 13:20 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Continuing with sync
  • 13:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Revert "Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace"" synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:19 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas3001.wikimedia.org - ayounsi@cumin1001"
  • 13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas3001.wikimedia.org on all recursors
  • 13:19 ayounsi@cumin1001: START - Cookbook sre.dns.wipe-cache atlas3001.wikimedia.org on all recursors
  • 13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1001"
  • 13:18 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1001"
  • 13:18 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace""
  • 13:16 lucaswerkmeister-wmde@deploy1002: Sync cancelled.
  • 13:16 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:16 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host atlas3001.wikimedia.org
  • 13:11 lucaswerkmeister-wmde@deploy1002: func and lucaswerkmeister-wmde: Backport for Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace" (T340697) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T337310)', diff saved to https://phabricator.wikimedia.org/P52409 and previous config saved to /var/cache/conftool/dbconfig/20230911-131001-ladsgroup.json
  • 13:09 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace" (T340697)
  • 13:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet
  • 13:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
  • 12:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet
  • 12:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
  • 12:38 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:37 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
  • 12:37 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
  • 12:30 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T337310)', diff saved to https://phabricator.wikimedia.org/P52408 and previous config saved to /var/cache/conftool/dbconfig/20230911-122535-ladsgroup.json
  • 12:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 12:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 12:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
  • 12:21 moritzm: restarting apache/FPM on mediawiki canaries
  • 12:18 moritzm: installing libssh2 security updates
  • 12:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
  • 12:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
  • 12:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
  • 12:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
  • 12:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
  • 11:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
  • 11:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
  • 11:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet
  • 11:42 Amir1: setting binlog format to STATEMENT in x1 eqiad and codfw masters (T337310)
  • 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
  • 11:42 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet
  • 11:41 claime: Rebooting poolcounter2003.codfw.wmnet
  • 11:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
  • 11:32 isaranto@deploy1002: Finished scap: Backport for ores-extension: enable lw in enwiki and wikidata (T342115) (duration: 23m 46s)
  • 11:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet
  • 11:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet
  • 11:26 isaranto@deploy1002: isaranto: Continuing with sync
  • 11:26 claime: Rebooting poolcounter2004.codfw.wmnet
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
  • 11:10 isaranto@deploy1002: isaranto: Backport for ores-extension: enable lw in enwiki and wikidata (T342115) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:09 isaranto@deploy1002: Started scap: Backport for ores-extension: enable lw in enwiki and wikidata (T342115)
  • 11:06 volans: installed spicearck v7.2.2 on both cumin hosts
  • 10:59 volans: uploaded spicerack_7.2.2 to apt.wikimedia.org bullseye-wikimedia
  • 10:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1003.wikimedia.org with OS bullseye
  • 10:27 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
  • 10:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
  • 10:14 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:03 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab1003.wikimedia.org with OS bullseye
  • 09:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
  • 09:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
  • 09:53 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
  • 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
  • 09:32 claime: rearmed keyholder on deploy2002.codfw.wmnet
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52405 and previous config saved to /var/cache/conftool/dbconfig/20230911-092650-root.json
  • 09:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
  • 09:25 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:24 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: T342361 - testing blazegraph startup script refactor
  • 09:24 gehel@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: T342361 - testing blazegraph startup script refactor
  • 09:18 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
  • 09:18 claime: rebooting deploy2002.codfw.wmnet
  • 09:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 09:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 09:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52404 and previous config saved to /var/cache/conftool/dbconfig/20230911-091817-arnaudb.json
  • 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52403 and previous config saved to /var/cache/conftool/dbconfig/20230911-091145-root.json
  • 09:08 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P52402 and previous config saved to /var/cache/conftool/dbconfig/20230911-090310-arnaudb.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52401 and previous config saved to /var/cache/conftool/dbconfig/20230911-085640-root.json
  • 08:52 urbanecm@deploy1002: Finished scap: Backport for Revert "Growth: Disable Add an image on all wikis" (T345188) (duration: 10m 27s)
  • 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52400 and previous config saved to /var/cache/conftool/dbconfig/20230911-085129-arnaudb.json
  • 08:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P52399 and previous config saved to /var/cache/conftool/dbconfig/20230911-084804-arnaudb.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52398 and previous config saved to /var/cache/conftool/dbconfig/20230911-084647-root.json
  • 08:46 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 08:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
  • 08:44 urbanecm@deploy1002: urbanecm: Backport for Revert "Growth: Disable Add an image on all wikis" (T345188) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:42 urbanecm@deploy1002: Started scap: Backport for Revert "Growth: Disable Add an image on all wikis" (T345188)
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52397 and previous config saved to /var/cache/conftool/dbconfig/20230911-084135-root.json
  • 08:37 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
  • 08:37 claime: rebooting mwmaint2002.codfw.wmnet
  • 08:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1001.eqiad.wmnet
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1119 with Debian Bookworm in s1 with just 1% T339185', diff saved to https://phabricator.wikimedia.org/P52396 and previous config saved to /var/cache/conftool/dbconfig/20230911-083346-marostegui.json
  • 08:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52395 and previous config saved to /var/cache/conftool/dbconfig/20230911-083258-arnaudb.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52394 and previous config saved to /var/cache/conftool/dbconfig/20230911-083143-root.json
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52393 and previous config saved to /var/cache/conftool/dbconfig/20230911-082631-root.json
  • 08:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwdebug1001.eqiad.wmnet
  • 08:26 claime: rebooting mwdebug1001.eqiad.wmnet
  • 08:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1002.eqiad.wmnet
  • 08:20 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwdebug1002.eqiad.wmnet
  • 08:20 claime: rebooting mwdebug1002.eqiad.wmnet
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52392 and previous config saved to /var/cache/conftool/dbconfig/20230911-081638-root.json
  • 08:13 kostajh: UTC morning deploys done
  • 08:13 kharlan@deploy1002: Finished scap: Backport for [beta] ReportIncident: Enable on kowiki beta (T339275), [beta] Enable ReportIncident for configured beta wikis (T339275), ReportIncident: Set default help page (T343382) (duration: 09m 44s)
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52391 and previous config saved to /var/cache/conftool/dbconfig/20230911-081126-root.json
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
  • 08:07 kharlan@deploy1002: kharlan: Continuing with sync
  • 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
  • 08:05 kharlan@deploy1002: kharlan: Backport for [beta] ReportIncident: Enable on kowiki beta (T339275), [beta] Enable ReportIncident for configured beta wikis (T339275), ReportIncident: Set default help page (T343382) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deplo
  • 08:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 08:03 kharlan@deploy1002: Started scap: Backport for [beta] ReportIncident: Enable on kowiki beta (T339275), [beta] Enable ReportIncident for configured beta wikis (T339275), ReportIncident: Set default help page (T343382)
  • 08:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 08:02 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 08:02 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 08:02 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 08:01 filippo@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52390 and previous config saved to /var/cache/conftool/dbconfig/20230911-080133-root.json
  • 08:00 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 08:00 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 08:00 kharlan@deploy1002: Finished scap: Backport for ReportIncident: Default deployment to false (T339275) (duration: 11m 15s)
  • 08:00 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 08:00 filippo@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 07:59 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 07:59 filippo@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 3%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52389 and previous config saved to /var/cache/conftool/dbconfig/20230911-075621-root.json
  • 07:53 kharlan@deploy1002: kharlan: Continuing with sync
  • 07:50 kharlan@deploy1002: kharlan: Backport for ReportIncident: Default deployment to false (T339275) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:49 kharlan@deploy1002: Started scap: Backport for ReportIncident: Default deployment to false (T339275)
  • 07:46 kharlan@deploy1002: Finished scap: Backport for Add ReportIncident extension (T339275) (duration: 22m 44s)
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52388 and previous config saved to /var/cache/conftool/dbconfig/20230911-074629-root.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52387 and previous config saved to /var/cache/conftool/dbconfig/20230911-074116-root.json
  • 07:36 kharlan@deploy1002: kharlan: Continuing with sync
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
  • 07:35 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 07:33 kharlan@deploy1002: kharlan: Backport for Add ReportIncident extension (T339275) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52386 and previous config saved to /var/cache/conftool/dbconfig/20230911-073124-root.json
  • 07:23 kharlan@deploy1002: Started scap: Backport for Add ReportIncident extension (T339275)
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 3%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52385 and previous config saved to /var/cache/conftool/dbconfig/20230911-071619-root.json
  • 07:11 kharlan@deploy1002: Started scap: Backport for Add ReportIncident extension (T339275)
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 1%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52384 and previous config saved to /var/cache/conftool/dbconfig/20230911-070114-root.json
  • 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
  • 06:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136065
  • 06:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136065
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1119 back to s1 depooled T339185', diff saved to https://phabricator.wikimedia.org/P52383 and previous config saved to /var/cache/conftool/dbconfig/20230911-054057-marostegui.json
  • 05:00 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
  • 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P52382 and previous config saved to /var/cache/conftool/dbconfig/20230911-045907-root.json
  • 01:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52381 and previous config saved to /var/cache/conftool/dbconfig/20230911-012911-arnaudb.json
  • 01:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 01:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 01:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343198)', diff saved to https://phabricator.wikimedia.org/P52380 and previous config saved to /var/cache/conftool/dbconfig/20230911-012850-arnaudb.json
  • 01:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P52379 and previous config saved to /var/cache/conftool/dbconfig/20230911-011343-arnaudb.json
  • 00:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P52378 and previous config saved to /var/cache/conftool/dbconfig/20230911-005837-arnaudb.json
  • 00:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343198)', diff saved to https://phabricator.wikimedia.org/P52377 and previous config saved to /var/cache/conftool/dbconfig/20230911-004331-arnaudb.json

2023-09-10

  • 17:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343198)', diff saved to https://phabricator.wikimedia.org/P52375 and previous config saved to /var/cache/conftool/dbconfig/20230910-173502-arnaudb.json
  • 17:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 17:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 11:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:19 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 11:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343198)', diff saved to https://phabricator.wikimedia.org/P52374 and previous config saved to /var/cache/conftool/dbconfig/20230910-111941-arnaudb.json
  • 11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P52373 and previous config saved to /var/cache/conftool/dbconfig/20230910-110435-arnaudb.json
  • 10:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P52372 and previous config saved to /var/cache/conftool/dbconfig/20230910-104929-arnaudb.json
  • 10:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343198)', diff saved to https://phabricator.wikimedia.org/P52371 and previous config saved to /var/cache/conftool/dbconfig/20230910-103422-arnaudb.json
  • 04:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T343198)', diff saved to https://phabricator.wikimedia.org/P52370 and previous config saved to /var/cache/conftool/dbconfig/20230910-042338-arnaudb.json
  • 04:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 04:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 04:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343198)', diff saved to https://phabricator.wikimedia.org/P52369 and previous config saved to /var/cache/conftool/dbconfig/20230910-042317-arnaudb.json
  • 04:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P52368 and previous config saved to /var/cache/conftool/dbconfig/20230910-040811-arnaudb.json
  • 03:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P52367 and previous config saved to /var/cache/conftool/dbconfig/20230910-035304-arnaudb.json
  • 03:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343198)', diff saved to https://phabricator.wikimedia.org/P52366 and previous config saved to /var/cache/conftool/dbconfig/20230910-033758-arnaudb.json
  • 01:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T343198)', diff saved to https://phabricator.wikimedia.org/P52365 and previous config saved to /var/cache/conftool/dbconfig/20230910-013823-arnaudb.json
  • 01:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 01:38 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 01:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 01:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 01:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343198)', diff saved to https://phabricator.wikimedia.org/P52364 and previous config saved to /var/cache/conftool/dbconfig/20230910-013745-arnaudb.json
  • 01:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P52363 and previous config saved to /var/cache/conftool/dbconfig/20230910-012239-arnaudb.json
  • 01:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P52362 and previous config saved to /var/cache/conftool/dbconfig/20230910-010733-arnaudb.json
  • 00:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343198)', diff saved to https://phabricator.wikimedia.org/P52361 and previous config saved to /var/cache/conftool/dbconfig/20230910-005226-arnaudb.json

2023-09-09

  • 20:16 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
  • 19:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 19:35 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
  • 19:14 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
  • 18:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T343198)', diff saved to https://phabricator.wikimedia.org/P52360 and previous config saved to /var/cache/conftool/dbconfig/20230909-182802-arnaudb.json
  • 18:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 18:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 18:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343198)', diff saved to https://phabricator.wikimedia.org/P52359 and previous config saved to /var/cache/conftool/dbconfig/20230909-182741-arnaudb.json
  • 18:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P52358 and previous config saved to /var/cache/conftool/dbconfig/20230909-181234-arnaudb.json
  • 17:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P52357 and previous config saved to /var/cache/conftool/dbconfig/20230909-175728-arnaudb.json
  • 17:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343198)', diff saved to https://phabricator.wikimedia.org/P52356 and previous config saved to /var/cache/conftool/dbconfig/20230909-174222-arnaudb.json
  • 17:35 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.codfw.wmnet with OS bookworm
  • 16:54 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
  • 16:51 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
  • 16:33 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bookworm
  • 16:27 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
  • 15:44 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 15:41 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 15:22 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
  • 11:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T343198)', diff saved to https://phabricator.wikimedia.org/P52355 and previous config saved to /var/cache/conftool/dbconfig/20230909-111508-arnaudb.json
  • 11:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:14 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52354 and previous config saved to /var/cache/conftool/dbconfig/20230909-111447-arnaudb.json
  • 10:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P52353 and previous config saved to /var/cache/conftool/dbconfig/20230909-105941-arnaudb.json
  • 10:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P52352 and previous config saved to /var/cache/conftool/dbconfig/20230909-104434-arnaudb.json
  • 10:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52351 and previous config saved to /var/cache/conftool/dbconfig/20230909-102928-arnaudb.json
  • 04:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52350 and previous config saved to /var/cache/conftool/dbconfig/20230909-040947-arnaudb.json
  • 04:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 04:09 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 04:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52349 and previous config saved to /var/cache/conftool/dbconfig/20230909-040925-arnaudb.json
  • 03:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P52348 and previous config saved to /var/cache/conftool/dbconfig/20230909-035419-arnaudb.json
  • 03:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P52347 and previous config saved to /var/cache/conftool/dbconfig/20230909-033913-arnaudb.json
  • 03:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52346 and previous config saved to /var/cache/conftool/dbconfig/20230909-032407-arnaudb.json
  • 02:19 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
  • 01:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 01:35 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 01:18 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye

2023-09-08

  • 21:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1034.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1033.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1032.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1030.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1028.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1034.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1033.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1032.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1031.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1030.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1028.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1056
  • 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1055
  • 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1054
  • 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1056
  • 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1055
  • 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1054
  • 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1053
  • 21:10 ejegg: civicrm upgraded from de883cd5 to 771fcde3
  • 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1052
  • 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1053
  • 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1052
  • 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1046
  • 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1051
  • 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1050
  • 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1050
  • 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1050
  • 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1051
  • 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1050
  • 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1049
  • 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1049
  • 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1048
  • 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1048
  • 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1047
  • 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1047
  • 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1048
  • 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1048
  • 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1047
  • 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1047
  • 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1046
  • 21:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52345 and previous config saved to /var/cache/conftool/dbconfig/20230908-210844-arnaudb.json
  • 21:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 21:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 21:08 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1044
  • 21:08 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1045
  • 21:07 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1041
  • 21:07 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1045
  • 21:07 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1044
  • 21:06 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1043
  • 21:06 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1042
  • 21:06 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1041
  • 21:05 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1043
  • 21:04 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1042
  • 21:04 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1038
  • 21:04 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1040
  • 21:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1040
  • 21:03 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1039
  • 21:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1038
  • 21:02 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1039
  • 21:02 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1037
  • 21:02 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1037
  • 21:02 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host kubernetes1039
  • 21:01 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1038
  • 21:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1037
  • 21:00 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1039
  • 21:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1035
  • 21:00 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1038
  • 21:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1036
  • 20:59 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1037
  • 20:59 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1036
  • 20:59 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1035
  • 20:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1032
  • 20:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1033
  • 20:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1034
  • 20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1034
  • 20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1033
  • 20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1031
  • 20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1030
  • 20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1032
  • 20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1031
  • 20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1028
  • 20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1029
  • 20:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1028
  • 20:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1030
  • 20:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1029
  • 20:53 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:53 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
  • 20:52 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
  • 20:49 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 20:28 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
  • 20:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
  • 20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:24 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 17:24 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
  • 17:20 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:19 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:13 taavi: reprepro copy bookworm-wikimedia bullseye-wikimedia prometheus-memcached-exporter # T345810
  • 16:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:16 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:53 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1027
  • 15:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1027
  • 15:45 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:45 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1027 - jclark@cumin1001"
  • 15:44 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1027 - jclark@cumin1001"
  • 15:44 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 15:42 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 15:27 sukhe: running authdns-update for CR 955943
  • 15:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['stat1011.eqiad.wmne']
  • 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['moss-be1003.eqiad.wmnet']
  • 15:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be1003.eqiad.wmnet']
  • 15:15 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-be1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:13 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1011.eqiad.wmne']
  • 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host stat1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host stat1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:43 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host stat1011
  • 14:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T343198)', diff saved to https://phabricator.wikimedia.org/P52343 and previous config saved to /var/cache/conftool/dbconfig/20230908-144321-arnaudb.json
  • 14:42 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host stat1011
  • 14:42 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:42 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt stat1011 - jclark@cumin1001"
  • 14:41 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt stat1011 - jclark@cumin1001"
  • 14:39 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P52342 and previous config saved to /var/cache/conftool/dbconfig/20230908-142815-arnaudb.json
  • 14:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host moss-be1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host moss-be1003
  • 14:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host moss-be1003
  • 14:27 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt moss-be1003 - jclark@cumin1001"
  • 14:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt moss-be1003 - jclark@cumin1001"
  • 14:24 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P52341 and previous config saved to /var/cache/conftool/dbconfig/20230908-141309-arnaudb.json
  • 13:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T343198)', diff saved to https://phabricator.wikimedia.org/P52340 and previous config saved to /var/cache/conftool/dbconfig/20230908-135803-arnaudb.json
  • 13:39 isaranto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 13:39 isaranto@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 13:39 isaranto@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 13:38 isaranto@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 13:37 isaranto@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 13:37 isaranto@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 13:34 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:34 kevinbazira@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:24 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:05 isaranto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 13:05 isaranto@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 13:01 isaranto@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 13:01 isaranto@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 13:00 isaranto@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 12:59 isaranto@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1006.wikimedia.org
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-replica1006.wikimedia.org with OS bookworm
  • 12:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 12:51 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 12:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 12:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 12:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 12:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica1006.wikimedia.org with reason: host reimage
  • 12:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica1006.wikimedia.org with reason: host reimage
  • 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1005.wikimedia.org
  • 12:31 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1005.wikimedia.org
  • 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica1006.wikimedia.org with OS bookworm
  • 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
  • 12:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
  • 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1006.wikimedia.org on all recursors
  • 12:23 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1006.wikimedia.org on all recursors
  • 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
  • 12:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
  • 12:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1006.wikimedia.org
  • 12:18 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica1006.wikimedia.org
  • 12:17 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1006.wikimedia.org
  • 12:05 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica1006.wikimedia.org
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1006.wikimedia.org on all recursors
  • 12:05 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1006.wikimedia.org on all recursors
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
  • 12:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
  • 11:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1006.wikimedia.org on all recursors
  • 11:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1006.wikimedia.org on all recursors
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
  • 11:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
  • 11:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:55 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1006.wikimedia.org
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1005.wikimedia.org
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-replica1005.wikimedia.org with OS bookworm
  • 11:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T343198)', diff saved to https://phabricator.wikimedia.org/P52337 and previous config saved to /var/cache/conftool/dbconfig/20230908-114911-arnaudb.json
  • 11:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 11:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 11:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52336 and previous config saved to /var/cache/conftool/dbconfig/20230908-114850-arnaudb.json
  • 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica1005.wikimedia.org with reason: host reimage
  • 11:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica1005.wikimedia.org with reason: host reimage
  • 11:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P52335 and previous config saved to /var/cache/conftool/dbconfig/20230908-113344-arnaudb.json
  • 11:23 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica1005.wikimedia.org with OS bookworm
  • 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
  • 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1005.wikimedia.org on all recursors
  • 11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1005.wikimedia.org on all recursors
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
  • 11:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
  • 11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P52334 and previous config saved to /var/cache/conftool/dbconfig/20230908-111838-arnaudb.json
  • 11:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:17 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1005.wikimedia.org
  • 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 11:14 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 11:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:07 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-rw2001.wikimedia.org with OS bookworm
  • 11:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:04 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52333 and previous config saved to /var/cache/conftool/dbconfig/20230908-110331-arnaudb.json
  • 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
  • 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
  • 10:33 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-rw2001.wikimedia.org with OS bookworm
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-rw1001.wikimedia.org with OS bookworm
  • 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
  • 10:07 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
  • 10:05 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-rw1001.wikimedia.org with OS bookworm
  • 09:46 vgutierrez: restart fifo-log-demux@notpurge.service in cp4052
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
  • 09:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
  • 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts furud.codfw.wmnet
  • 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: furud.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: furud.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:22 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts furud.codfw.wmnet
  • 09:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:16 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:13 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:11 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
  • 09:10 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:00 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 08:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:09 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:06 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:01 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 07:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52328 and previous config saved to /var/cache/conftool/dbconfig/20230908-075901-arnaudb.json
  • 07:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 07:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 07:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T343198)', diff saved to https://phabricator.wikimedia.org/P52327 and previous config saved to /var/cache/conftool/dbconfig/20230908-075840-arnaudb.json
  • 07:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P52326 and previous config saved to /var/cache/conftool/dbconfig/20230908-074334-arnaudb.json
  • 07:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P52325 and previous config saved to /var/cache/conftool/dbconfig/20230908-072828-arnaudb.json
  • 07:27 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:26 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:26 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:25 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 07:25 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:25 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 07:24 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 07:24 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 07:24 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 07:23 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 07:23 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:23 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 07:22 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 07:22 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 07:22 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 07:21 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T343198)', diff saved to https://phabricator.wikimedia.org/P52324 and previous config saved to /var/cache/conftool/dbconfig/20230908-071322-arnaudb.json
  • 07:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 05:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
  • 04:57 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
  • 04:54 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
  • 04:29 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
  • 04:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T343198)', diff saved to https://phabricator.wikimedia.org/P52323 and previous config saved to /var/cache/conftool/dbconfig/20230908-042821-arnaudb.json
  • 04:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 04:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 04:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52322 and previous config saved to /var/cache/conftool/dbconfig/20230908-042800-arnaudb.json
  • 04:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P52321 and previous config saved to /var/cache/conftool/dbconfig/20230908-041254-arnaudb.json
  • 03:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P52320 and previous config saved to /var/cache/conftool/dbconfig/20230908-035747-arnaudb.json
  • 03:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52319 and previous config saved to /var/cache/conftool/dbconfig/20230908-034241-arnaudb.json
  • 00:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52318 and previous config saved to /var/cache/conftool/dbconfig/20230908-005323-arnaudb.json
  • 00:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 00:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 00:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T343198)', diff saved to https://phabricator.wikimedia.org/P52317 and previous config saved to /var/cache/conftool/dbconfig/20230908-005301-arnaudb.json
  • 00:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P52316 and previous config saved to /var/cache/conftool/dbconfig/20230908-003755-arnaudb.json
  • 00:23 eileen: civicrm upgraded from e81ed4e9 to de883cd5
  • 00:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P52315 and previous config saved to /var/cache/conftool/dbconfig/20230908-002248-arnaudb.json
  • 00:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T343198)', diff saved to https://phabricator.wikimedia.org/P52314 and previous config saved to /var/cache/conftool/dbconfig/20230908-000742-arnaudb.json
  • 00:03 eileen: civicrm upgraded from 5a432b1e to e81ed4e9

2023-09-07

  • 23:12 ejegg: payments-wiki upgraded from 639a8d6a to c524f53f
  • 22:45 jhuneidi@deploy1002: Installation of scap version "4.59.0" completed for 594 hosts
  • 22:44 jhuneidi@deploy1002: Installing scap version "4.59.0" for 594 hosts
  • 22:30 jhuneidi@deploy1002: Installing scap version "4.59.0" for 595 hosts
  • 22:29 jeena: installing scap v4.59.0
  • 22:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T343198)', diff saved to https://phabricator.wikimedia.org/P52313 and previous config saved to /var/cache/conftool/dbconfig/20230907-214717-arnaudb.json
  • 21:47 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 21:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 21:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T343198)', diff saved to https://phabricator.wikimedia.org/P52312 and previous config saved to /var/cache/conftool/dbconfig/20230907-214640-arnaudb.json
  • 21:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P52311 and previous config saved to /var/cache/conftool/dbconfig/20230907-213134-arnaudb.json
  • 21:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P52310 and previous config saved to /var/cache/conftool/dbconfig/20230907-211628-arnaudb.json
  • 21:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T343198)', diff saved to https://phabricator.wikimedia.org/P52309 and previous config saved to /var/cache/conftool/dbconfig/20230907-210122-arnaudb.json
  • 20:56 thcipriani@deploy1002: Finished scap: Backport for Preserve Gadget prefs when they can't be enabled (T341421), Fix settings button not working on reference previews (T345829) (duration: 11m 12s)
  • 20:50 thcipriani@deploy1002: jdlrobson and thcipriani: Continuing with sync
  • 20:49 taavi@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2444.codfw.wmnet
  • 20:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:46 thcipriani@deploy1002: jdlrobson and thcipriani: Backport for Preserve Gadget prefs when they can't be enabled (T341421), Fix settings button not working on reference previews (T345829) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD o
  • 20:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:45 thcipriani@deploy1002: Started scap: Backport for Preserve Gadget prefs when they can't be enabled (T341421), Fix settings button not working on reference previews (T345829)
  • 20:41 thcipriani@deploy1002: Finished scap: Backport for Pre-deploy Reader Demographics 2 pilot survey (T344393) (duration: 10m 59s)
  • 20:33 thcipriani@deploy1002: dani and thcipriani: Continuing with sync
  • 20:31 thcipriani@deploy1002: dani and thcipriani: Backport for Pre-deploy Reader Demographics 2 pilot survey (T344393) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:30 thcipriani@deploy1002: Started scap: Backport for Pre-deploy Reader Demographics 2 pilot survey (T344393)
  • 20:23 thcipriani@deploy1002: Finished scap: Backport for Undeploy Campaigns Event Discovery survey (T345158) (duration: 17m 58s)
  • 20:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
  • 20:11 thcipriani@deploy1002: thcipriani and dani: Continuing with sync
  • 20:07 thcipriani@deploy1002: thcipriani and dani: Backport for Undeploy Campaigns Event Discovery survey (T345158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:05 thcipriani@deploy1002: Started scap: Backport for Undeploy Campaigns Event Discovery survey (T345158)
  • 19:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
  • 19:38 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
  • 19:37 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 19:33 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 19:13 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
  • 18:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: T342361
  • 18:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: T342361
  • 18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T343198)', diff saved to https://phabricator.wikimedia.org/P52308 and previous config saved to /var/cache/conftool/dbconfig/20230907-183153-arnaudb.json
  • 18:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 18:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52307 and previous config saved to /var/cache/conftool/dbconfig/20230907-183132-arnaudb.json
  • 18:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52306 and previous config saved to /var/cache/conftool/dbconfig/20230907-181626-arnaudb.json
  • 18:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52305 and previous config saved to /var/cache/conftool/dbconfig/20230907-180120-arnaudb.json
  • 17:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52304 and previous config saved to /var/cache/conftool/dbconfig/20230907-174613-arnaudb.json
  • 17:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52303 and previous config saved to /var/cache/conftool/dbconfig/20230907-174351-arnaudb.json
  • 17:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 17:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 16:45 Amir1: running moveToExternal on all wikis
  • 15:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004.eqiad.wmnet']
  • 15:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004.eqiad.wmnet']
  • 15:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
  • 15:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
  • 15:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
  • 15:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
  • 15:37 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:33 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lists1004
  • 15:32 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lists1004
  • 15:28 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:20 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:19 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:13 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:13 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:11 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:11 filippo@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 14:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 14:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 14:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
  • 14:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
  • 14:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
  • 14:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
  • 14:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1136.eqiad.wmnet with OS bullseye
  • 14:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
  • 14:30 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
  • 14:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
  • 14:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
  • 14:30 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
  • 14:30 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
  • 14:28 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:27 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 14:27 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbstore1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbstore1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:24 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:24 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:23 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:23 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:22 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:20 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:19 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1135.eqiad.wmnet with OS bullseye
  • 14:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
  • 14:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
  • 14:15 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 14:14 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 14:13 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1136.eqiad.wmnet with reason: host reimage
  • 14:12 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 14:11 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 14:10 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
  • 14:10 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
  • 14:10 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1136.eqiad.wmnet with reason: host reimage
  • 14:03 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:02 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:58 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:58 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:58 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:58 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:58 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:57 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:56 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1135.eqiad.wmnet with reason: host reimage
  • 13:56 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1136.eqiad.wmnet with OS bullseye
  • 13:53 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1135.eqiad.wmnet with reason: host reimage
  • 13:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbstore1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbstore1008.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:51 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbstore1009
  • 13:51 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbstore1008
  • 13:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbstore1009
  • 13:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbstore1008
  • 13:50 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbstore100{8..9} - jclark@cumin1001"
  • 13:50 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbstore100{8..9} - jclark@cumin1001"
  • 13:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
  • 13:40 XioNoX: trunk sandbox vlan to ganeti nodes in esams BY27 - T307021
  • 13:40 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1135.eqiad.wmnet with OS bullseye
  • 13:38 taavi: taavi@mwmaint1002 ~ $ mwscript extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php --wiki=labswiki | tee oathauth-multiple-labswiki.log # T242031
  • 13:38 taavi@deploy1002: Finished scap: Backport for Set OATHAuth multiple devices READ_NEW for all fishbows, privates (T242031), Set OATHAuth multiple devices WRITE_BOTH for wikitech (T242031) (duration: 08m 52s)
  • 13:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
  • 13:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:31 taavi@deploy1002: taavi: Continuing with sync
  • 13:30 taavi@deploy1002: taavi: Backport for Set OATHAuth multiple devices READ_NEW for all fishbows, privates (T242031), Set OATHAuth multiple devices WRITE_BOTH for wikitech (T242031) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:29 taavi@deploy1002: Started scap: Backport for Set OATHAuth multiple devices READ_NEW for all fishbows, privates (T242031), Set OATHAuth multiple devices WRITE_BOTH for wikitech (T242031)
  • 13:27 taavi@deploy1002: Finished scap: Backport for Edit check: Turn on when ecenable=1 is set (T345297) (duration: 09m 46s)
  • 13:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pki1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:22 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pki1002
  • 13:21 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pki1002
  • 13:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:21 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pki1002 - jclark@cumin1001"
  • 13:20 taavi@deploy1002: taavi and kemayo: Continuing with sync
  • 13:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pki1002 - jclark@cumin1001"
  • 13:18 taavi@deploy1002: taavi and kemayo: Backport for Edit check: Turn on when ecenable=1 is set (T345297) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:18 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts atlas2001.wikimedia.org
  • 13:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
  • 13:17 taavi@deploy1002: Started scap: Backport for Edit check: Turn on when ecenable=1 is set (T345297)
  • 13:14 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
  • 13:12 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 13:08 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts atlas2001.wikimedia.org
  • 12:35 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 12:34 filippo@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 12:23 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:23 filippo@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:04 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 12:04 claime: Starting eqiad jobrunner reboots
  • 12:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 12:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 11:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
  • 11:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
  • 11:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
  • 11:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
  • 11:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
  • 11:10 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:09 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 11:04 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
  • 10:56 urbanecm: mwmaint1002: `/usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue` (T344428, testing with r955319 deployed)
  • 10:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 10:54 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 10:51 ladsgroup@deploy1002: Finished scap: Backport for Pin pagelinks normalization stage to old in production (T345732) (duration: 09m 05s)
  • 10:46 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:45 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 10:44 ladsgroup@deploy1002: ladsgroup: Backport for Pin pagelinks normalization stage to old in production (T345732) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 10:42 ladsgroup@deploy1002: Started scap: Backport for Pin pagelinks normalization stage to old in production (T345732)
  • 10:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1441-1442,1451].eqiad.wmnet
  • 10:35 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1441-1442,1451].eqiad.wmnet
  • 10:35 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:33 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 10:32 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 10:29 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
  • 10:24 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:24 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:23 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:23 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 10:21 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:10 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
  • 10:08 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetserver1002.eqiad.wmnet with OS bookworm
  • 10:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.25 refs T343727
  • 10:03 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1134.eqiad.wmnet with OS bullseye
  • 09:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1002.eqiad.wmnet with reason: host reimage
  • 09:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1002.eqiad.wmnet with reason: host reimage
  • 09:54 hashar@deploy1002: Finished scap: Backport for RevisionReviewForm: allow setting `null` tag (T345804) (duration: 07m 54s)
  • 09:48 hashar@deploy1002: ladsgroup and hashar: Continuing with sync
  • 09:47 hashar@deploy1002: ladsgroup and hashar: Backport for RevisionReviewForm: allow setting `null` tag (T345804) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:46 hashar@deploy1002: Started scap: Backport for RevisionReviewForm: allow setting `null` tag (T345804)
  • 09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1134.eqiad.wmnet with reason: host reimage
  • 09:39 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1002.eqiad.wmnet with OS bookworm
  • 09:38 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1133.eqiad.wmnet with OS bullseye
  • 09:38 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1134.eqiad.wmnet with reason: host reimage
  • 09:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 09:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52300 and previous config saved to /var/cache/conftool/dbconfig/20230907-093718-arnaudb.json
  • 09:24 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1134.eqiad.wmnet with OS bullseye
  • 09:22 moritzm: installing grub2 updates from Bullseye point release
  • 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P52299 and previous config saved to /var/cache/conftool/dbconfig/20230907-092212-arnaudb.json
  • 09:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1133.eqiad.wmnet with reason: host reimage
  • 09:14 taavi: foreachwikiindblist private extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php | tee oathauth-multiple-private.log # T242031
  • 09:12 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1133.eqiad.wmnet with reason: host reimage
  • 09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P52298 and previous config saved to /var/cache/conftool/dbconfig/20230907-090706-arnaudb.json
  • 08:59 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1133.eqiad.wmnet with OS bullseye
  • 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52297 and previous config saved to /var/cache/conftool/dbconfig/20230907-085159-arnaudb.json
  • 08:51 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:46 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.41.0-wmf.24 - T343727
  • 08:38 moritzm: installing librsvg security updates
  • 08:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mc2040.codfw.wmnet with reason: T345802 - hw troubleshooting
  • 08:23 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mc2040.codfw.wmnet with reason: T345802 - hw troubleshooting
  • 08:22 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.25 refs T343727
  • 07:57 moritzm: installing grub2 updates from Bullseye point release
  • 07:40 moritzm: installing file/libmagic security updates
  • 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
  • 07:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
  • 07:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
  • 07:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
  • 06:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
  • 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
  • 06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52296 and previous config saved to /var/cache/conftool/dbconfig/20230907-062900-arnaudb.json
  • 06:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 06:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs[1003-1004].eqiad.wmnet with reason: reboot
  • 06:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs[1003-1004].eqiad.wmnet with reason: reboot
  • 06:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 06:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T343198)', diff saved to https://phabricator.wikimedia.org/P52295 and previous config saved to /var/cache/conftool/dbconfig/20230907-062838-arnaudb.json
  • 06:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P52294 and previous config saved to /var/cache/conftool/dbconfig/20230907-061332-arnaudb.json
  • 05:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P52293 and previous config saved to /var/cache/conftool/dbconfig/20230907-055826-arnaudb.json
  • 05:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T343198)', diff saved to https://phabricator.wikimedia.org/P52292 and previous config saved to /var/cache/conftool/dbconfig/20230907-054320-arnaudb.json
  • 05:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 05:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:40 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 03:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 03:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 03:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 03:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T343198)', diff saved to https://phabricator.wikimedia.org/P52291 and previous config saved to /var/cache/conftool/dbconfig/20230907-032306-arnaudb.json
  • 03:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 03:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 03:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T343198)', diff saved to https://phabricator.wikimedia.org/P52290 and previous config saved to /var/cache/conftool/dbconfig/20230907-032245-arnaudb.json
  • 03:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P52289 and previous config saved to /var/cache/conftool/dbconfig/20230907-030739-arnaudb.json
  • 02:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P52288 and previous config saved to /var/cache/conftool/dbconfig/20230907-025233-arnaudb.json
  • 02:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T343198)', diff saved to https://phabricator.wikimedia.org/P52287 and previous config saved to /var/cache/conftool/dbconfig/20230907-023727-arnaudb.json
  • 01:10 tstarling@deploy1002: Synchronized php-1.41.0-wmf.25/extensions/Phonos/extension.json: fix breakage of Phonos on parser-cached pages T345414 (duration: 06m 59s)
  • 00:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T343198)', diff saved to https://phabricator.wikimedia.org/P52286 and previous config saved to /var/cache/conftool/dbconfig/20230907-003038-arnaudb.json
  • 00:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 00:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 00:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T343198)', diff saved to https://phabricator.wikimedia.org/P52285 and previous config saved to /var/cache/conftool/dbconfig/20230907-003017-arnaudb.json
  • 00:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P52284 and previous config saved to /var/cache/conftool/dbconfig/20230907-001510-arnaudb.json
  • 00:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P52283 and previous config saved to /var/cache/conftool/dbconfig/20230907-000004-arnaudb.json

2023-09-06

  • 23:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T343198)', diff saved to https://phabricator.wikimedia.org/P52282 and previous config saved to /var/cache/conftool/dbconfig/20230906-234458-arnaudb.json
  • 22:10 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host flink-zk2003.codfw.wmnet
  • 22:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk2003.codfw.wmnet with OS bookworm
  • 21:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk2003.codfw.wmnet with reason: host reimage
  • 21:53 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk2003.codfw.wmnet with reason: host reimage
  • 21:44 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 21:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T343198)', diff saved to https://phabricator.wikimedia.org/P52281 and previous config saved to /var/cache/conftool/dbconfig/20230906-214205-arnaudb.json
  • 21:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 21:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 21:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T343198)', diff saved to https://phabricator.wikimedia.org/P52280 and previous config saved to /var/cache/conftool/dbconfig/20230906-214145-arnaudb.json
  • 21:40 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1007.eqiad.wmnet with OS bullseye
  • 21:39 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 21:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1006.eqiad.wmnet with OS bullseye
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2003.codfw.wmnet with OS bookworm
  • 21:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P52279 and previous config saved to /var/cache/conftool/dbconfig/20230906-212638-arnaudb.json
  • 21:23 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2003.codfw.wmnet - bking@cumin1001"
  • 21:22 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2003.codfw.wmnet - bking@cumin1001"
  • 21:22 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2003.codfw.wmnet on all recursors
  • 21:22 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2003.codfw.wmnet on all recursors
  • 21:22 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:22 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2003.codfw.wmnet - bking@cumin1001"
  • 21:21 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2003.codfw.wmnet - bking@cumin1001"
  • 21:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:18 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 21:18 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2003.codfw.wmnet
  • 21:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: host reimage
  • 21:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: host reimage
  • 21:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P52278 and previous config saved to /var/cache/conftool/dbconfig/20230906-211132-arnaudb.json
  • 21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: host reimage
  • 21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: host reimage
  • 20:58 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1007.eqiad.wmnet with OS bullseye
  • 20:58 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1006.eqiad.wmnet with OS bullseye
  • 20:56 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host flink-zk2002.codfw.wmnet
  • 20:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk2002.codfw.wmnet with OS bookworm
  • 20:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T343198)', diff saved to https://phabricator.wikimedia.org/P52277 and previous config saved to /var/cache/conftool/dbconfig/20230906-205626-arnaudb.json
  • 20:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 20:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk2002.codfw.wmnet with reason: host reimage
  • 20:40 taavi@deploy1002: Finished scap: Backport for Article: Check permissions before showing link to view deleted revision (T264765), Article: Check permissions before showing link to view deleted revision (T264765), TopicSubscriptionsPager: Handle invalid titles (T345648), TopicSubscriptionsPager: Handle invalid titles (T345648) (duration: 09m 42s)
  • 20:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk2002.codfw.wmnet with reason: host reimage
  • 20:34 taavi@deploy1002: matmarex and taavi: Continuing with sync
  • 20:32 taavi@deploy1002: matmarex and taavi: Backport for Article: Check permissions before showing link to view deleted revision (T264765), Article: Check permissions before showing link to view deleted revision (T264765), TopicSubscriptionsPager: Handle invalid titles (T345648), TopicSubscriptionsPager: Handle invalid titles (T345648) synced to the tes
  • 20:30 taavi@deploy1002: Started scap: Backport for Article: Check permissions before showing link to view deleted revision (T264765), Article: Check permissions before showing link to view deleted revision (T264765), TopicSubscriptionsPager: Handle invalid titles (T345648), TopicSubscriptionsPager: Handle invalid titles (T345648)
  • 20:30 taavi@deploy1002: Finished scap: Backport for Add wikispecies logo (T341252), Disable wordmark on Gothic Wikipedia (T341253), Wikimania logos and taglines (T341254) (duration: 14m 25s)
  • 20:24 taavi@deploy1002: jdlrobson and taavi: Continuing with sync
  • 20:17 taavi@deploy1002: jdlrobson and taavi: Backport for Add wikispecies logo (T341252), Disable wordmark on Gothic Wikipedia (T341253), Wikimania logos and taglines (T341254) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XW
  • 20:15 taavi@deploy1002: Started scap: Backport for Add wikispecies logo (T341252), Disable wordmark on Gothic Wikipedia (T341253), Wikimania logos and taglines (T341254)
  • 20:14 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2002.codfw.wmnet with OS bookworm
  • 20:14 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2002.codfw.wmnet - bking@cumin1001"
  • 20:14 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2002.codfw.wmnet - bking@cumin1001"
  • 20:13 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2002.codfw.wmnet on all recursors
  • 20:13 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2002.codfw.wmnet on all recursors
  • 20:13 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:13 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2002.codfw.wmnet - bking@cumin1001"
  • 20:13 taavi@deploy1002: Finished scap: Backport for GrowthExperiments: enable add a link in 12 and 13th round of wikis (T308137 T308138) (duration: 10m 16s)
  • 20:12 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2002.codfw.wmnet - bking@cumin1001"
  • 20:10 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:10 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2002.codfw.wmnet
  • 20:07 taavi@deploy1002: taavi and sgimeno: Continuing with sync
  • 20:04 taavi@deploy1002: taavi and sgimeno: Backport for GrowthExperiments: enable add a link in 12 and 13th round of wikis (T308137 T308138) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:03 taavi@deploy1002: Started scap: Backport for GrowthExperiments: enable add a link in 12 and 13th round of wikis (T308137 T308138)
  • 19:18 hmonroy@deploy1002: Finished scap: Backport for Delay loading ext.phonos module until user clicks (T345414) (duration: 07m 58s)
  • 19:12 hmonroy@deploy1002: hmonroy and musikanimal: Continuing with sync
  • 19:12 hmonroy@deploy1002: hmonroy and musikanimal: Backport for Delay loading ext.phonos module until user clicks (T345414) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 19:10 hmonroy@deploy1002: Started scap: Backport for Delay loading ext.phonos module until user clicks (T345414)
  • 18:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T343198)', diff saved to https://phabricator.wikimedia.org/P52276 and previous config saved to /var/cache/conftool/dbconfig/20230906-181602-arnaudb.json
  • 18:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 18:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 18:00 cmooney@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030']
  • 18:00 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030']
  • 18:00 cmooney@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['restbase1030']
  • 18:00 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030']
  • 17:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase1030']
  • 17:58 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030']
  • 17:55 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 17:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1132.eqiad.wmnet with OS bullseye
  • 17:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
  • 17:22 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
  • 17:05 brett: Upload libvmod-re2_1.5.3-5_amd64 to bookworm-wikimedia
  • 16:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 16:43 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 16:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:42 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove entries for cloudweb2002-dev - cmooney@cumin1001"
  • 16:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS bullseye
  • 16:25 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove entries for cloudweb2002-dev - cmooney@cumin1001"
  • 16:14 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 15:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1387.eqiad.wmnet
  • 15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1385.eqiad.wmnet
  • 15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1373.eqiad.wmnet
  • 15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1364.eqiad.wmnet
  • 15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1384.eqiad.wmnet
  • 15:41 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
  • 15:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 15:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 15:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52275 and previous config saved to /var/cache/conftool/dbconfig/20230906-153957-arnaudb.json
  • 15:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
  • 15:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
  • 15:38 akosiaris: sudo ethtool -G eno1 rx 1000 on conf2005, conf2006 to test out the theory. T345738
  • 15:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
  • 15:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
  • 15:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P52274 and previous config saved to /var/cache/conftool/dbconfig/20230906-152451-arnaudb.json
  • 15:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P52273 and previous config saved to /var/cache/conftool/dbconfig/20230906-150945-arnaudb.json
  • 15:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['moss-be2003']
  • 15:06 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
  • 15:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
  • 15:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
  • 14:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
  • 14:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
  • 14:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52272 and previous config saved to /var/cache/conftool/dbconfig/20230906-145439-arnaudb.json
  • 14:52 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
  • 14:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
  • 14:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-be2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:31 claime: Repooling mw1349.eqiad.wmnet - T345741
  • 14:22 claime: Leaving mw1349.eqiad.wmnet pooled=invalid until management interface investigation - T345741
  • 14:18 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 14:18 claime: Restarting appserver reboots
  • 13:59 claime: repooling mw1351.eqiad.wmnet
  • 13:57 claime: powercycling mw1349.eqiad.wmnet
  • 13:54 claime: powercycling mw1351.eqiad.wmnet
  • 13:53 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1351.eqiad.wmnet
  • 13:53 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1349.eqiad.wmnet
  • 13:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
  • 13:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
  • 13:38 akosiaris: sudo ethtool -G eno1 rx 1000 on conf2004 T345738
  • 13:38 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
  • 13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2047.codfw.wmnet
  • 13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
  • 13:33 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
  • 13:21 sukhe: homer "asw1-b*27-esams*" commit "add durum300[34]"
  • 13:21 taavi: taavi@mwmaint1002 ~ $ cat logos-to-purge.txt | mwscript purgeList.php --wiki enwiki # T345666
  • 13:21 taavi@deploy1002: Finished scap: Backport for bnwikisource: update legacy vector logo (T345666) (duration: 17m 35s)
  • 13:20 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 13:20 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 13:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2001.codfw.wmnet on all recursors
  • 13:19 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
  • 13:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:19 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 13:18 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 13:16 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 13:16 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
  • 13:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
  • 13:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
  • 13:07 taavi@deploy1002: taavi and anzx: Continuing with sync
  • 13:05 taavi@deploy1002: taavi and anzx: Backport for bnwikisource: update legacy vector logo (T345666) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:03 taavi@deploy1002: Started scap: Backport for bnwikisource: update legacy vector logo (T345666)
  • 12:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:11 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 12:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
  • 12:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
  • 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52270 and previous config saved to /var/cache/conftool/dbconfig/20230906-120448-arnaudb.json
  • 12:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 12:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52269 and previous config saved to /var/cache/conftool/dbconfig/20230906-120427-arnaudb.json
  • 12:03 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
  • 12:03 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
  • 11:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P52268 and previous config saved to /var/cache/conftool/dbconfig/20230906-114921-arnaudb.json
  • 11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P52267 and previous config saved to /var/cache/conftool/dbconfig/20230906-113414-arnaudb.json
  • 11:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1045.eqiad.wmnet
  • 11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
  • 11:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
  • 11:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1045.eqiad.wmnet
  • 11:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52266 and previous config saved to /var/cache/conftool/dbconfig/20230906-111908-arnaudb.json
  • 11:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
  • 11:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
  • 10:58 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
  • 10:58 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
  • 10:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
  • 10:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1043.eqiad.wmnet
  • 10:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
  • 10:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1043.eqiad.wmnet
  • 10:27 topranks: Resetting PIC 1/1 on cr2-codfw to enable et-1/1/5 at 100G (T345583)
  • 10:15 topranks: shut cr2-codfw xe-1/1/1:3 interface to cr1-codfw ahead of card 1/1 reset (T345583)
  • 10:08 topranks: Draining cr2-codfw transport cct's to eqdfw and eqiad prior to card 1/1 reset (T345583)
  • 09:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:57 topranks: de-activating peering sessions at DE-CIX Dallas on cr2-codfw prior to card 1/1 reset (T345583)
  • 09:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ganeti-test01.svc.eqiad.wmnet on all recursors
  • 09:51 ayounsi@cumin1001: START - Cookbook sre.dns.wipe-cache ganeti-test01.svc.eqiad.wmnet on all recursors
  • 09:49 topranks: Making cr1-codfw VRRP primary for connections to row C and D prior to card 1/1 reset (T345583)
  • 09:49 jbond: enable puppet post switch puppetdbs gerrit:954622
  • 09:28 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:26 jbond: disable puppet to switch puppetdbs gerrit:954622
  • 09:23 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 09:23 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 09:23 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 09:23 topranks: Resetting PIC 1/1 on cr1-codfw to enable port et-1/1/5 at 100G (T345583)
  • 09:23 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 09:15 topranks: Shutting cr1-codfw port xe-1/1/1:1 to cr2-codfw before card 1/1 reset (T345583)
  • 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52265 and previous config saved to /var/cache/conftool/dbconfig/20230906-090541-arnaudb.json
  • 09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 09:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 09:05 topranks: Draining transport circuits landing on cr1-codfw card 1/1 prior to reset (T345583)
  • 08:58 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
  • 08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
  • 08:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
  • 08:25 hashar@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.25 refs T343727 (duration: 06m 31s)
  • 08:18 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.25 refs T343727
  • 07:51 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 07:51 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 07:21 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service in more wikis - rollout #2 (T341445) (duration: 11m 05s)
  • 07:15 kartik@deploy1002: abi and kartik: Continuing with sync
  • 07:11 kartik@deploy1002: abi and kartik: Backport for Enable MinT translation service in more wikis - rollout #2 (T341445) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:10 kartik@deploy1002: Started scap: Backport for Enable MinT translation service in more wikis - rollout #2 (T341445)
  • 05:28 tstarling@deploy1002: Synchronized php-1.41.0-wmf.25/extensions/Phonos: Fix UBN client-side error from malformed Phonos tags T345672 (duration: 06m 51s)
  • 04:07 eileen: civicrm upgraded from a6fd7d6b to 5a432b1e

2023-09-05

  • 23:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk2001.codfw.wmnet
  • 23:44 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:44 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 23:37 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 23:34 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 23:30 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk2001.codfw.wmnet
  • 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update DNS entries for kubernetes2029 and 2030 - pt1979@cumin2002"
  • 22:57 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update DNS entries for kubernetes2029 and 2030 - pt1979@cumin2002"
  • 22:55 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:22 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
  • 22:11 urbanecm: mwmaint1002: `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --batch-size=20 --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=6hour --verbose --use-job-queue ` (debugging T344428, lowered batch size [100 -> 20])
  • 21:38 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.25 refs T343727
  • 21:38 urbanecm: mwmaint1002: `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=6hour --verbose --use-job-queue` (trying to reproduce T344428)
  • 21:34 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 21:28 sbassett: Deployed updated security mitigation for T336027
  • 21:28 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 21:21 cjming@deploy1002: Finished scap: Backport for Fix unseen notifications icon (T345483) (duration: 13m 46s)
  • 21:16 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
  • 21:16 cjming: end of UTC late backport window
  • 21:15 cjming@deploy1002: jdlrobson and cjming: Continuing with sync
  • 21:12 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
  • 21:09 cjming@deploy1002: jdlrobson and cjming: Backport for Fix unseen notifications icon (T345483) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:09 eileen: config revision changed from c2f91f49 to e1c3b7fd
  • 21:08 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
  • 21:07 cjming@deploy1002: Started scap: Backport for Fix unseen notifications icon (T345483)
  • 20:49 cjming@deploy1002: Finished scap: Backport for Fix unseen notifications icon (T345483) (duration: 16m 45s)
  • 20:43 cjming@deploy1002: cjming and jdlrobson: Continuing with sync
  • 20:34 cjming@deploy1002: cjming and jdlrobson: Backport for Fix unseen notifications icon (T345483) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:33 cjming@deploy1002: Started scap: Backport for Fix unseen notifications icon (T345483)
  • 20:32 cjming@deploy1002: Finished scap: Backport for Fix temp user popup appearing on every new page creation (T345569) (duration: 11m 37s)
  • 20:26 cjming@deploy1002: cjming and matmarex: Continuing with sync
  • 20:22 cjming@deploy1002: cjming and matmarex: Backport for Fix temp user popup appearing on every new page creation (T345569) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:20 cjming@deploy1002: Started scap: Backport for Fix temp user popup appearing on every new page creation (T345569)
  • 20:17 cjming@deploy1002: Finished scap: Backport for Deploy Campaigns Event Discovery survey (T345158) (duration: 10m 27s)
  • 20:11 cjming@deploy1002: cjming and dani: Continuing with sync
  • 20:09 fab@deploy1002: Finished deploy [airflow-dags/research@90f280e]: (no justification provided) (duration: 00m 17s)
  • 20:09 fab@deploy1002: Started deploy [airflow-dags/research@90f280e]: (no justification provided)
  • 20:08 cjming@deploy1002: cjming and dani: Backport for Deploy Campaigns Event Discovery survey (T345158) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:07 cjming@deploy1002: Started scap: Backport for Deploy Campaigns Event Discovery survey (T345158)
  • 19:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bookworm
  • 19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:18 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 19:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
  • 19:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
  • 18:59 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 18:59 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 18:52 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh1001.wikimedia.org with OS bookworm
  • 18:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2029.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 18:18 topranks: Running authdns-update to add includes for newly assigned codfw subnets
  • 18:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2029.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 17:57 dcausse: T345545: triggered a manual dag run to import analytics_platform_eng.image_suggestions_search_index_full/snapshot=2023-08-21
  • 17:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2039.codfw.wmnet with OS bullseye
  • 17:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2038.codfw.wmnet with OS bullseye
  • 17:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:47 dcausse@deploy1002: Finished deploy [airflow-dags/search@b3d43bb]: T345545: search: generalize image_suggestions_manual (duration: 00m 26s)
  • 17:47 dcausse@deploy1002: Started deploy [airflow-dags/search@b3d43bb]: T345545: search: generalize image_suggestions_manual
  • 17:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bookworm
  • 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2039.codfw.wmnet with reason: host reimage
  • 17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2038.codfw.wmnet with reason: host reimage
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52263 and previous config saved to /var/cache/conftool/dbconfig/20230905-173132-ladsgroup.json
  • 17:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2039.codfw.wmnet with reason: host reimage
  • 17:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2038.codfw.wmnet with reason: host reimage
  • 17:21 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2037.codfw.wmnet with OS bullseye
  • 17:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:18 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52262 and previous config saved to /var/cache/conftool/dbconfig/20230905-171627-ladsgroup.json
  • 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:10 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
  • 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2036.codfw.wmnet with OS bullseye
  • 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 17:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
  • 17:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2037.codfw.wmnet with reason: host reimage
  • 17:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
  • 17:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 17:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
  • 17:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
  • 17:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2035.codfw.wmnet with OS bullseye
  • 17:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2034.codfw.wmnet with OS bullseye
  • 17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:02 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
  • 17:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2037.codfw.wmnet with reason: host reimage
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52260 and previous config saved to /var/cache/conftool/dbconfig/20230905-170122-ladsgroup.json
  • 16:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mc2042.codfw.wmnet
  • 16:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
  • 16:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1042.eqiad.wmnet
  • 16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh1002.wikimedia.org with OS bookworm
  • 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P52259 and previous config saved to /var/cache/conftool/dbconfig/20230905-164618-ladsgroup.json
  • 16:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1042.eqiad.wmnet
  • 16:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
  • 16:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2037.codfw.wmnet with OS bullseye
  • 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2033.codfw.wmnet with OS bullseye
  • 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:37 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2035.codfw.wmnet with reason: host reimage
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2034.codfw.wmnet with reason: host reimage
  • 16:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
  • 16:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
  • 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2035.codfw.wmnet with reason: host reimage
  • 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2034.codfw.wmnet with reason: host reimage
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2036.codfw.wmnet with OS bullseye
  • 16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2030.codfw.wmnet with OS bullseye
  • 16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
  • 16:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
  • 16:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2033.codfw.wmnet with reason: host reimage
  • 16:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2033.codfw.wmnet with reason: host reimage
  • 16:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2030.codfw.wmnet with reason: host reimage
  • 16:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2035.codfw.wmnet with OS bullseye
  • 16:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2034.codfw.wmnet with OS bullseye
  • 16:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2030.codfw.wmnet with reason: host reimage
  • 16:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2031.codfw.wmnet with OS bullseye
  • 16:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2032.codfw.wmnet with OS bullseye
  • 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2033.codfw.wmnet with OS bullseye
  • 15:49 claime: Repooled mw2448.eqiad.wmnet - T345597
  • 15:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2032.codfw.wmnet with reason: host reimage
  • 15:45 claime: Repooling mw2448.eqiad.wmnet
  • 15:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2031.codfw.wmnet with reason: host reimage
  • 15:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2032.codfw.wmnet with reason: host reimage
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2031.codfw.wmnet with reason: host reimage
  • 15:36 kamila_: Datacenter switchover live test completed (T345588)
  • 15:35 kamila@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover Live Test - T345588 (duration: 30m 45s)
  • 15:34 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
  • 15:28 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
  • 15:28 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
  • 15:27 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
  • 15:27 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 15:25 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 15:25 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 15:25 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 15:25 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 15:25 kamila@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2023-09-05 15:25:15.979250
  • 15:25 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
  • 15:24 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 15:24 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 15:24 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
  • 15:24 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
  • 15:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2032.codfw.wmnet with OS bullseye
  • 15:21 kamila@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
  • 15:20 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 15:20 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
  • 15:19 kamila@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2023-09-05 15:19:50.101327
  • 15:19 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 15:19 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2031.codfw.wmnet with OS bullseye
  • 15:19 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 15:19 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 15:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2030.codfw.wmnet with OS bullseye
  • 15:13 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 15:13 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
  • 15:13 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
  • 15:13 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 15:13 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 15:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
  • 15:04 kamila@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover Live Test - T345588
  • 14:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3004.esams.wmnet with OS bookworm
  • 14:50 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: Datacenter Switchover Live test - T345588
  • 14:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2032.codfw.wmnet with OS bullseye
  • 14:32 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
  • 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testreduce1002.eqiad.wmnet with OS bookworm
  • 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test vip - ayounsi@cumin1001"
  • 14:28 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test vip - ayounsi@cumin1001"
  • 14:26 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 14:26 kamila@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: Datacenter Switchover Live test - T345588
  • 14:26 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 14:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2031.codfw.wmnet with OS bullseye
  • 14:25 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in codfw: Datacenter Switchover Live test - T345588
  • 14:25 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 14:24 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 14:24 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 14:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 14:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 14:21 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:21 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 14:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2030.codfw.wmnet with OS bullseye
  • 14:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
  • 14:16 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3004.esams.wmnet with reason: host reimage
  • 14:15 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 14:15 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2029.codfw.wmnet with OS bullseye
  • 14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3004.esams.wmnet with reason: host reimage
  • 14:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
  • 14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
  • 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testreduce1002.eqiad.wmnet with reason: host reimage
  • 14:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testreduce1002.eqiad.wmnet with reason: host reimage
  • 14:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
  • 14:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
  • 14:01 kamila@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Switchover Live test - T345588
  • 13:57 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testreduce1002.eqiad.wmnet with OS bookworm
  • 13:52 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lift wing for most wikis (T342115) (duration: 18m 33s)
  • 13:46 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
  • 13:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2032.codfw.wmnet with OS bullseye
  • 13:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
  • 13:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum3004.esams.wmnet with OS bookworm
  • 13:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum3003.esams.wmnet with OS bookworm
  • 13:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
  • 13:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2031.codfw.wmnet with OS bullseye
  • 13:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2026.codfw.wmnet with OS bullseye
  • 13:36 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:35 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: enable lift wing for most wikis (T342115) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:33 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lift wing for most wikis (T342115)
  • 13:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS bullseye
  • 13:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52258 and previous config saved to /var/cache/conftool/dbconfig/20230905-133046-arnaudb.json
  • 13:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2028.codfw.wmnet with OS bullseye
  • 13:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:24 taavi@deploy1002: Finished scap: Backport for tlywiki: add metanamespace , timezone, sitename (T345316), tlywiki: Add logos (T345316) (duration: 10m 18s)
  • 13:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2030.codfw.wmnet with OS bullseye
  • 13:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2027.codfw.wmnet with OS bullseye
  • 13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2029.codfw.wmnet with OS bullseye
  • 13:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 13:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2026.codfw.wmnet with reason: host reimage
  • 13:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 13:18 taavi@deploy1002: taavi and anzx: Continuing with sync
  • 13:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test1002.eqiad.wmnet with OS bullseye
  • 13:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
  • 13:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2026.codfw.wmnet with reason: host reimage
  • 13:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
  • 13:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52257 and previous config saved to /var/cache/conftool/dbconfig/20230905-131540-arnaudb.json
  • 13:15 taavi@deploy1002: taavi and anzx: Backport for tlywiki: add metanamespace , timezone, sitename (T345316), tlywiki: Add logos (T345316) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:13 taavi@deploy1002: Started scap: Backport for tlywiki: add metanamespace , timezone, sitename (T345316), tlywiki: Add logos (T345316)
  • 13:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
  • 13:12 taavi@deploy1002: Finished scap: Backport for Disable EchoMail and EchoInteraction instruments (T344167) (duration: 10m 14s)
  • 13:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2028.codfw.wmnet with OS bullseye
  • 13:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
  • 13:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2025.codfw.wmnet with reason: host reimage
  • 13:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
  • 13:08 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
  • 13:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
  • 13:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1039.eqiad.wmnet
  • 13:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
  • 13:07 taavi@deploy1002: taavi and phuedx: Continuing with sync
  • 13:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2025.codfw.wmnet with reason: host reimage
  • 13:06 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 13:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2026.codfw.wmnet with OS bullseye
  • 13:06 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 13:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 13:04 taavi@deploy1002: taavi and phuedx: Backport for Disable EchoMail and EchoInteraction instruments (T344167) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test1001.eqiad.wmnet with OS bullseye
  • 13:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
  • 13:02 taavi@deploy1002: Started scap: Backport for Disable EchoMail and EchoInteraction instruments (T344167)
  • 13:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
  • 13:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
  • 13:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52254 and previous config saved to /var/cache/conftool/dbconfig/20230905-130034-arnaudb.json
  • 12:55 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 12:55 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 12:55 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 12:54 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 12:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test1002.eqiad.wmnet with reason: host reimage
  • 12:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52252 and previous config saved to /var/cache/conftool/dbconfig/20230905-124528-arnaudb.json
  • 12:44 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS bullseye
  • 12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test1002.eqiad.wmnet with reason: host reimage
  • 12:43 elukey@deploy1002: Finished scap: Backport for Add new OAuth Rate Limiter tier for Wiki Education (T345394) (duration: 07m 49s)
  • 12:37 elukey@deploy1002: elukey: Continuing with sync
  • 12:37 elukey@deploy1002: elukey: Backport for Add new OAuth Rate Limiter tier for Wiki Education (T345394) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:37 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
  • 12:35 elukey@deploy1002: Started scap: Backport for Add new OAuth Rate Limiter tier for Wiki Education (T345394)
  • 12:20 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test1002.eqiad.wmnet with OS bullseye
  • 12:18 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 12:18 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 12:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test1001.eqiad.wmnet with reason: host reimage
  • 12:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
  • 12:17 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 12:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
  • 12:16 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 12:14 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test1001.eqiad.wmnet with reason: host reimage
  • 12:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
  • 12:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
  • 11:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
  • 11:52 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test1001.eqiad.wmnet with OS bullseye
  • 11:51 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti-test1001.eqiad.wmnet with OS bullseye
  • 11:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 11:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
  • 11:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
  • 11:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
  • 11:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
  • 11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
  • 11:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
  • 11:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
  • 11:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
  • 11:24 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard-next
  • 11:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
  • 11:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
  • 11:18 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test1001.eqiad.wmnet with OS bullseye
  • 11:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
  • 11:09 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 11:09 kamila@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
  • 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
  • 10:41 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 10:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 10:36 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 10:34 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 10:33 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52247 and previous config saved to /var/cache/conftool/dbconfig/20230905-095254-arnaudb.json
  • 09:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 09:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 09:49 moritzm: failover ganeti master in esams/BY27 to ganeti3007
  • 09:43 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti-test1001']
  • 09:43 ayounsi@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test1001']
  • 09:41 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 09:26 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host ganeti-test1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 09:25 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-test1002
  • 09:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti-test1001
  • 09:20 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-test1001
  • 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test - ayounsi@cumin1001"
  • 09:16 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test - ayounsi@cumin1001"
  • 09:14 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 09:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
  • 09:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
  • 09:04 claime: powercycle mw1356.eqiad.wmnet
  • 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 08:51 jnuche@deploy1002: sync-world aborted: testwikis wikis to 1.41.0-wmf.25 refs T343727 (duration: 20m 37s)
  • 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
  • 08:40 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 08:31 jnuche@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.25 refs T343727
  • 08:12 kartik@deploy1002: Finished scap: Backport for Add "editautopatrolprotected" and "editpatrolprotected" protection levels on shwiki (T344306) (duration: 10m 47s)
  • 08:06 kartik@deploy1002: aleksandar and kartik: Continuing with sync
  • 08:03 kartik@deploy1002: aleksandar and kartik: Backport for Add "editautopatrolprotected" and "editpatrolprotected" protection levels on shwiki (T344306) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:01 kartik@deploy1002: Started scap: Backport for Add "editautopatrolprotected" and "editpatrolprotected" protection levels on shwiki (T344306)
  • 07:56 kartik@deploy1002: Finished scap: Backport for Enable AbuseFilter blocks on shwiki (T345513) (duration: 19m 29s)
  • 07:46 moritzm: depool mw2448 (unreachable)
  • 07:45 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS bullseye
  • 07:42 kartik@deploy1002: kartik and aleksandar: Continuing with sync
  • 07:38 kartik@deploy1002: kartik and aleksandar: Backport for Enable AbuseFilter blocks on shwiki (T345513) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:36 kartik@deploy1002: Started scap: Backport for Enable AbuseFilter blocks on shwiki (T345513)
  • 07:32 kartik@deploy1002: Finished scap: Backport for Enable Section and Content Translation in 7 Wikipedias (T343211) (duration: 15m 45s)
  • 07:23 moritzm: failover ganeti masters in esams to ganeti3007/ganeti3008
  • 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
  • 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
  • 07:20 kartik@deploy1002: kartik: Continuing with sync
  • 07:18 kartik@deploy1002: kartik: Backport for Enable Section and Content Translation in 7 Wikipedias (T343211) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:16 kartik@deploy1002: Started scap: Backport for Enable Section and Content Translation in 7 Wikipedias (T343211)
  • 07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
  • 07:08 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1131.eqiad.wmnet with OS bullseye
  • 07:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
  • 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
  • 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
  • 06:59 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS bullseye
  • 06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
  • 06:51 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1130.eqiad.wmnet with OS bullseye
  • 06:49 tstarling@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Labs only change, just avoiding undeployed changes (duration: 09m 25s)
  • 06:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1131.eqiad.wmnet with reason: host reimage
  • 06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
  • 06:43 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1131.eqiad.wmnet with reason: host reimage
  • 06:29 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1131.eqiad.wmnet with OS bullseye
  • 06:26 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1130.eqiad.wmnet with reason: host reimage
  • 06:24 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1130.eqiad.wmnet with reason: host reimage
  • 06:10 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1130.eqiad.wmnet with OS bullseye
  • 06:06 kart_: Updated cxserver to 2023-08-29-191442-production (T345170, T343450)
  • 06:04 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:04 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:01 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:58 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:57 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:55 kart_: Updated MinT to 2023-09-04-051105-production (T336683)
  • 05:46 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 05:41 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:36 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:30 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:25 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:22 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 03:59 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.25 refs T343727 (duration: 56m 29s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.25 refs T343727

2023-09-04

  • 16:14 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 16:07 topranks: setting port 1/1/5 to speed 100G on cr2-codfw
  • 16:06 topranks: setting port 1/1/5 to speed 100G on cr1-codfw
  • 16:05 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 14s)
  • 15:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 01s)
  • 14:57 moritzm: installing json-c security updates
  • 14:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 14:47 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 14:44 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 14:31 godog: bounce prometheus@k8s-aux
  • 14:29 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:58 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:55 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
  • 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:50 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
  • 13:50 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:49 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
  • 13:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
  • 13:48 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:48 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
  • 13:41 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
  • 13:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:15 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1002.eqiad.wmnet with OS bullseye
  • 12:46 hnowlan: staggered restarting restbase service on A:restbase
  • 12:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 149665
  • 12:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 149665
  • 12:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138884
  • 12:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138884
  • 12:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136065
  • 12:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136065
  • 12:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 27381
  • 12:18 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netbox1002.eqiad.wmnet
  • 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox1002.eqiad.wmnet
  • 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox2002.codfw.wmnet
  • 12:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 27381
  • 12:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox2002.codfw.wmnet
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
  • 11:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
  • 11:53 hnowlan@deploy1002: Finished deploy [restbase/deploy@26bc1a5]: Add new wikis T343543 T343549 T345171 (duration: 14m 32s)
  • 11:51 moritzm: installing grub2 updates from Bullseye point release
  • 11:51 moritzm: installing grub2 updates from Bullseye point relese
  • 11:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1002.eqiad.wmnet with reason: host reimage
  • 11:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1002.eqiad.wmnet with reason: host reimage
  • 11:38 hnowlan@deploy1002: Started deploy [restbase/deploy@26bc1a5]: Add new wikis T343543 T343549 T345171
  • 11:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1002.eqiad.wmnet with OS bullseye
  • 11:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: " - jbond@cumin1001 - T342534"
  • 11:08 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: " - jbond@cumin1001 - T342534"
  • 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 10:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 10:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 10:29 jbond: enable-puppet fleet wide post "deploy confd change gerrit:954007"
  • 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 10:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 10:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 09:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 09:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 09:49 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 09:49 akosiaris: T345290. Update mathoid to 2023-05-13-192519-production
  • 09:48 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
  • 09:48 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 09:48 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 09:47 jbond: disable-puppet fleet wide "deploy confd change gerrit:954007"
  • 09:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
  • 09:45 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Add CP secret (duration: 15m 47s)
  • 09:44 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:43 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 09:43 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 09:42 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1129.eqiad.wmnet with OS bullseye
  • 09:42 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 09:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 09:39 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 09:38 ladsgroup@deploy1002: ladsgroup: Add CP secret synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:34 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 09:34 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 09:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 09:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 09:29 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1129.eqiad.wmnet with reason: host reimage
  • 09:29 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 09:28 akosiaris: deploying mathoid to bump service mesh envoy version to 1.23.10-2-s2. No changes to the app.
  • 09:27 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1129.eqiad.wmnet with reason: host reimage
  • 09:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 09:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 09:14 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1129.eqiad.wmnet with OS bullseye
  • 09:13 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 09:10 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1003.eqiad.wmnet
  • 09:09 elukey: rename "ens5" to "ens13" on orespoolcounter1003's /etc/network/interfaces after a VM reboot
  • 09:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1003.eqiad.wmnet
  • 09:04 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1004.eqiad.wmnet
  • 08:57 elukey: rename "ens5" to "ens13" on orespoolcounter1004's /etc/network/interfaces after a VM reboot
  • 08:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
  • 08:51 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
  • 08:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2003.codfw.wmnet
  • 08:46 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1004.eqiad.wmnet
  • 08:46 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2003.codfw.wmnet
  • 08:45 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast6002.wikimedia.org
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
  • 08:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2003.codfw.wmnet
  • 08:41 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:39 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-eqiad
  • 08:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
  • 08:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1003.eqiad.wmnet
  • 08:37 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 08:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast6002.wikimedia.org
  • 08:34 elukey: rename "ens5" to "ens13" on orespoolcounter2003's /etc/network/interfaces after a VM reboot
  • 08:33 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-eqiad
  • 08:33 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1003.eqiad.wmnet
  • 08:31 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast5003.wikimedia.org
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:28 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubestagemaster2002.codfw.wmnet
  • 08:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:25 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet
  • 08:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubestagemaster1002.eqiad.wmnet
  • 08:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:19 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
  • 08:18 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
  • 08:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast5003.wikimedia.org
  • 08:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
  • 08:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast4004.wikimedia.org
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:15 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2002.codfw.wmnet
  • 08:15 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet
  • 08:14 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2003.codfw.wmnet
  • 08:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
  • 08:14 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
  • 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
  • 08:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
  • 08:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
  • 08:13 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:11 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1002.eqiad.wmnet
  • 08:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet
  • 08:10 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
  • 08:09 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4004.wikimedia.org
  • 08:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1001.eqiad.wmnet
  • 08:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 08:00 elukey: restart kubelet on ml-serve1002 to check if stale prometheus metrics are the cause of the stop_container alert
  • 08:00 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2004.codfw.wmnet
  • 07:59 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2004.codfw.wmnet
  • 07:35 Emperor: restart tcpircbot-logmsgbot on alert1001
  • 07:22 moritzm: failover ganeti masters in drmrs to ganeti6001/ganeti6002
  • 06:12 XioNoX: push new pfw policies - T345288

2023-09-02

  • 15:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1128.eqiad.wmnet with reason: depooled after replica lag page, two days
  • 15:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1128.eqiad.wmnet with reason: depooled after replica lag page, two days
  • 15:49 sukhe@cumin2002: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P52244 and previous config saved to /var/cache/conftool/dbconfig/20230902-154903-sukhe.json
  • 05:45 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
  • 05:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
  • 05:38 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1008.eqiad.wmnet
  • 05:32 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1008.eqiad.wmnet
  • 00:06 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:06 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay IRBs. - cmooney@cumin1001"
  • 00:05 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay IRBs. - cmooney@cumin1001"
  • 00:02 cmooney@cumin1001: START - Cookbook sre.dns.netbox

2023-09-01

  • 23:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:55 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay loopbacks. - cmooney@cumin1001"
  • 23:54 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay loopbacks. - cmooney@cumin1001"
  • 23:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 23:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:46 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-a4-codfw.mgmt.codfw.wmnet
  • 22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-a5-codfw.mgmt.codfw.wmnet
  • 22:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2027.codfw.wmnet with OS bullseye
  • 22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-a8-codfw.mgmt.codfw.wmnet
  • 22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-b4-codfw.mgmt.codfw.wmnet
  • 22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
  • 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
  • 22:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-b5-codfw.mgmt.codfw.wmnet
  • 22:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
  • 22:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh4002.wikimedia.org
  • 22:22 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh4002.wikimedia.org
  • 22:02 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh4002.wikimedia.org with OS bookworm
  • 21:57 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw
  • 21:57 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b8-codfw
  • 21:57 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw
  • 21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b7-codfw
  • 21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw
  • 21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b6-codfw
  • 21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw
  • 21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b5-codfw
  • 21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw
  • 21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b4-codfw
  • 21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw
  • 21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b3-codfw
  • 21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw
  • 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b2-codfw
  • 21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw
  • 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a8-codfw
  • 21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw
  • 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a7-codfw
  • 21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw
  • 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a6-codfw
  • 21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw
  • 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a5-codfw
  • 21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw
  • 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a4-codfw
  • 21:54 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw
  • 21:54 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a3-codfw
  • 21:54 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw
  • 21:54 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a2-codfw
  • 21:54 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a1-codfw
  • 21:54 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a1-codfw
  • 21:52 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
  • 21:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
  • 21:40 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b8-codfw.mgmt.codfw.wmnet
  • 21:36 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b7-codfw.mgmt.codfw.wmnet
  • 21:32 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b6-codfw.mgmt.codfw.wmnet
  • 21:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh4002.wikimedia.org with OS bookworm
  • 21:29 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b5-codfw.mgmt.codfw.wmnet
  • 21:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:11 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 21:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 21:08 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:08 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b8-codfw - cmooney@cumin1001"
  • 21:08 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b8-codfw - cmooney@cumin1001"
  • 21:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:05 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b8-codfw.mgmt.codfw.wmnet
  • 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b7-codfw - cmooney@cumin1001"
  • 21:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b7-codfw - cmooney@cumin1001"
  • 21:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:01 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b7-codfw.mgmt.codfw.wmnet
  • 21:01 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:01 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b6-codfw - cmooney@cumin1001"
  • 21:00 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b6-codfw - cmooney@cumin1001"
  • 20:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:58 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b6-codfw.mgmt.codfw.wmnet
  • 20:57 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:57 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b5-codfw - cmooney@cumin1001"
  • 20:56 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b5-codfw - cmooney@cumin1001"
  • 20:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 20:26 robh@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 173
  • 20:25 robh@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 173
  • 20:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh3003.wikimedia.org
  • 20:11 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh3003.wikimedia.org
  • 20:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:04 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b5-codfw.mgmt.codfw.wmnet
  • 20:03 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b4-codfw.mgmt.codfw.wmnet
  • 20:00 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
  • 19:59 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
  • 19:56 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:56 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:56 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
  • 19:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
  • 19:48 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:47 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b4-codfw - cmooney@cumin1001"
  • 19:30 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
  • 19:30 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
  • 19:28 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b4-codfw - cmooney@cumin1001"
  • 19:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
  • 19:23 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh3003.wikimedia.org with OS bookworm
  • 19:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2026.codfw.wmnet with OS bullseye
  • 19:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
  • 19:12 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
  • 19:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
  • 19:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2027.codfw.wmnet with OS bullseye
  • 19:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
  • 19:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
  • 19:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
  • 18:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2029.codfw.wmnet with OS bullseye
  • 18:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2028.codfw.wmnet with OS bullseye
  • 18:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
  • 18:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
  • 18:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2027.codfw.wmnet with OS bullseye
  • 18:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh3003.wikimedia.org with OS bookworm
  • 18:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
  • 18:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
  • 18:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
  • 18:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
  • 18:39 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:39 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b4-codfw.mgmt.codfw.wmnet
  • 18:35 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release
  • 18:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2029.codfw.wmnet with OS bullseye
  • 18:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
  • 18:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2028.codfw.wmnet with OS bullseye
  • 18:22 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
  • 18:21 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b3-codfw.mgmt.codfw.wmnet
  • 18:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
  • 18:16 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh3004.wikimedia.org
  • 18:16 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh3004.wikimedia.org
  • 18:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2026.codfw.wmnet with OS bullseye
  • 18:04 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh3004.wikimedia.org with OS bookworm
  • 17:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
  • 17:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 17:53 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 17:53 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 17:53 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
  • 17:49 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:49 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b3-codfw - cmooney@cumin1001"
  • 17:48 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b3-codfw - cmooney@cumin1001"
  • 17:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:46 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b3-codfw.mgmt.codfw.wmnet
  • 17:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh3004.wikimedia.org with OS bookworm
  • 17:30 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b2-codfw.mgmt.codfw.wmnet
  • 17:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh5001.wikimedia.org
  • 17:19 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh5001.wikimedia.org
  • 17:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2036']
  • 17:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2036']
  • 17:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2036']
  • 17:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2036']
  • 17:13 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:13 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new spine links. - cmooney@cumin1001"
  • 17:11 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new spine links. - cmooney@cumin1001"
  • 17:11 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release
  • 17:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:06 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh5001.wikimedia.org with OS bookworm
  • 16:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
  • 16:59 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
  • 16:58 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:58 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b2-codfw - cmooney@cumin1001"
  • 16:58 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b2-codfw - cmooney@cumin1001"
  • 16:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:55 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b2-codfw.mgmt.codfw.wmnet
  • 16:54 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
  • 16:53 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
  • 16:53 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
  • 16:50 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 16:50 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
  • 16:50 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
  • 16:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
  • 16:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2036.codfw.wmnet with OS bullseye
  • 16:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
  • 16:22 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
  • 16:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2036.codfw.wmnet with OS bullseye
  • 16:21 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 16:19 pmiazga: T343983 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=amwiki --logwiki=metawiki Jean-Mahmood User92259453
  • 16:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2037.codfw.wmnet with OS bullseye
  • 15:57 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
  • 15:57 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
  • 15:55 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh5001.wikimedia.org with OS bookworm
  • 15:43 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a8-codfw.mgmt.codfw.wmnet
  • 15:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
  • 15:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
  • 15:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
  • 15:11 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:11 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a8-codfw - cmooney@cumin1001"
  • 15:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
  • 15:05 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
  • 14:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply security updates - bking@cumin1001 - T344587
  • 14:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a8-codfw - cmooney@cumin1001"
  • 14:49 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:49 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a8-codfw.mgmt.codfw.wmnet
  • 14:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
  • 14:40 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 14:39 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 14:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2036.codfw.wmnet with OS bullseye
  • 14:39 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2001.codfw.wmnet on all recursors
  • 14:38 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
  • 14:38 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 14:34 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:33 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
  • 14:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
  • 14:31 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:30 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
  • 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:29 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: sync
  • 14:29 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
  • 14:28 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:28 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
  • 14:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
  • 14:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:23 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: sync
  • 14:23 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
  • 14:23 lsobanski@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security release
  • 14:21 elu