Server Admin Log
Jump to navigation
Jump to search
2023-09-27
- 07:39 Emperor: repool ms-fe2009
- 06:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
- 06:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
- 06:50 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 06:50 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 06:50 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 06:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
- 06:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
- 05:54 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 05:53 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 05:53 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 04:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
- 04:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
- 02:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1241.eqiad.wmnet with OS bullseye
- 02:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1240.eqiad.wmnet with OS bullseye
- 02:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1239.eqiad.wmnet with OS bullseye
- 02:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS bullseye
- 02:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1238.eqiad.wmnet with OS bullseye
- 02:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1237.eqiad.wmnet with OS bullseye
- 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1235.eqiad.wmnet with OS bullseye
- 02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1234.eqiad.wmnet with OS bullseye
- 02:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1240.eqiad.wmnet with reason: host reimage
- 02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1241.eqiad.wmnet with reason: host reimage
- 02:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1239.eqiad.wmnet with reason: host reimage
- 02:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage
- 02:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1238.eqiad.wmnet with reason: host reimage
- 02:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1237.eqiad.wmnet with reason: host reimage
- 02:24 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1241.eqiad.wmnet with reason: host reimage
- 02:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1235.eqiad.wmnet with reason: host reimage
- 02:23 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1240.eqiad.wmnet with reason: host reimage
- 02:22 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1239.eqiad.wmnet with reason: host reimage
- 02:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1238.eqiad.wmnet with reason: host reimage
- 02:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1234.eqiad.wmnet with reason: host reimage
- 02:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1237.eqiad.wmnet with reason: host reimage
- 02:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage
- 02:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1235.eqiad.wmnet with reason: host reimage
- 02:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1234.eqiad.wmnet with reason: host reimage
- 02:11 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1241.eqiad.wmnet with OS bullseye
- 02:11 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2025.codfw.wmnet
- 02:11 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2025.codfw.wmnet
- 02:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1240.eqiad.wmnet with OS bullseye
- 02:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS bullseye
- 02:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1239.eqiad.wmnet with OS bullseye
- 02:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1238.eqiad.wmnet with OS bullseye
- 02:08 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1237.eqiad.wmnet with OS bullseye
- 02:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS bullseye
- 02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1235.eqiad.wmnet with OS bullseye
- 02:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1234.eqiad.wmnet with OS bullseye
- 02:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 02:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 02:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P52682 and previous config saved to /var/cache/conftool/dbconfig/20230927-020034-arnaudb.json
- 01:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2025.codfw.wmnet with reason: host reimage
- 01:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P52681 and previous config saved to /var/cache/conftool/dbconfig/20230927-014527-arnaudb.json
- 01:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2025.codfw.wmnet with reason: host reimage
- 01:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P52680 and previous config saved to /var/cache/conftool/dbconfig/20230927-013020-arnaudb.json
- 01:26 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS bullseye
- 01:25 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2022.codfw.wmnet
- 01:25 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2022.codfw.wmnet
- 01:25 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2022.codfw.wmnet with OS bullseye
- 01:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P52679 and previous config saved to /var/cache/conftool/dbconfig/20230927-011514-arnaudb.json
- 01:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2022.codfw.wmnet with reason: host reimage
- 00:59 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2022.codfw.wmnet with reason: host reimage
- 00:42 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2022.codfw.wmnet with OS bullseye
- 00:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P52678 and previous config saved to /var/cache/conftool/dbconfig/20230927-004144-arnaudb.json
- 00:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 00:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 00:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52677 and previous config saved to /var/cache/conftool/dbconfig/20230927-004122-arnaudb.json
- 00:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2020.codfw.wmnet with OS bullseye
- 00:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P52676 and previous config saved to /var/cache/conftool/dbconfig/20230927-002616-arnaudb.json
- 00:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P52675 and previous config saved to /var/cache/conftool/dbconfig/20230927-001109-arnaudb.json
2023-09-26
- 23:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52674 and previous config saved to /var/cache/conftool/dbconfig/20230926-235602-arnaudb.json
- 23:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2020.codfw.wmnet with reason: host reimage
- 23:52 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2020.codfw.wmnet with reason: host reimage
- 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52673 and previous config saved to /var/cache/conftool/dbconfig/20230926-235026-arnaudb.json
- 23:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 23:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52672 and previous config saved to /var/cache/conftool/dbconfig/20230926-235005-arnaudb.json
- 23:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
- 23:41 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase2022.codfw.wmnet
- 23:41 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2022.codfw.wmnet
- 23:41 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase2022.codfw.wmnet
- 23:41 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2022.codfw.wmnet
- 23:36 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2020.codfw.wmnet with OS bullseye
- 23:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P52671 and previous config saved to /var/cache/conftool/dbconfig/20230926-233458-arnaudb.json
- 23:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P52670 and previous config saved to /var/cache/conftool/dbconfig/20230926-231951-arnaudb.json
- 23:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52669 and previous config saved to /var/cache/conftool/dbconfig/20230926-230445-arnaudb.json
- 22:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
- 22:47 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2020.codfw.wmnet']
- 22:47 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2020.codfw.wmnet']
- 22:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:36 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2016.codfw.wmnet with OS bullseye
- 22:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004']
- 22:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004']
- 22:26 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004']
- 22:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004']
- 22:25 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004']
- 22:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004']
- 22:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
- 22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52668 and previous config saved to /var/cache/conftool/dbconfig/20230926-220812-arnaudb.json
- 22:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 22:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P52667 and previous config saved to /var/cache/conftool/dbconfig/20230926-220801-arnaudb.json
- 21:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2016.codfw.wmnet with reason: host reimage
- 21:56 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2016.codfw.wmnet with reason: host reimage
- 21:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P52666 and previous config saved to /var/cache/conftool/dbconfig/20230926-215254-arnaudb.json
- 21:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS bullseye
- 21:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P52665 and previous config saved to /var/cache/conftool/dbconfig/20230926-213747-arnaudb.json
- 21:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
- 21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P52664 and previous config saved to /var/cache/conftool/dbconfig/20230926-212240-arnaudb.json
- 21:22 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:13 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:13 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:13 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:09 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:08 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 21:00 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 20:59 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 20:59 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 20:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 20:50 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 20:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 20:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
- 20:48 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 20:48 taavi@deploy2002: Finished scap: Backport for Set WRITE_NEW for Wikitech on OATHAuth multiple devices migration (T242031) (duration: 07m 38s)
- 20:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P52663 and previous config saved to /var/cache/conftool/dbconfig/20230926-204331-arnaudb.json
- 20:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 20:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 20:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P52662 and previous config saved to /var/cache/conftool/dbconfig/20230926-204309-arnaudb.json
- 20:42 taavi@deploy2002: taavi: Continuing with sync
- 20:42 taavi@deploy2002: taavi: Backport for Set WRITE_NEW for Wikitech on OATHAuth multiple devices migration (T242031) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:42 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2016.codfw.wmnet with OS bullseye
- 20:40 taavi@deploy2002: Started scap: Backport for Set WRITE_NEW for Wikitech on OATHAuth multiple devices migration (T242031)
- 20:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
- 20:38 taavi@deploy2002: Finished scap: Backport for Do not set $wgPasswordResetRoutes['domain'] (T345226), Do not set $wgPasswordResetRoutes['domain'] (T345226) (duration: 08m 35s)
- 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1016.eqiad.wmnet with OS bullseye
- 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:31 taavi@deploy2002: taavi: Continuing with sync
- 20:31 taavi@deploy2002: taavi: Backport for Do not set $wgPasswordResetRoutes['domain'] (T345226), Do not set $wgPasswordResetRoutes['domain'] (T345226) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:29 taavi@deploy2002: Started scap: Backport for Do not set $wgPasswordResetRoutes['domain'] (T345226), Do not set $wgPasswordResetRoutes['domain'] (T345226)
- 20:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P52661 and previous config saved to /var/cache/conftool/dbconfig/20230926-202803-arnaudb.json
- 20:26 taavi@deploy2002: Finished scap: Backport for Add $wgExternalLinksDomainGaps (T341000) (duration: 09m 44s)
- 20:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:19 taavi@deploy2002: taavi and lucaswerkmeister: Continuing with sync
- 20:18 taavi@deploy2002: taavi and lucaswerkmeister: Backport for Add $wgExternalLinksDomainGaps (T341000) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:17 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1015.eqiad.wmnet with OS bullseye
- 20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:16 taavi@deploy2002: Started scap: Backport for Add $wgExternalLinksDomainGaps (T341000)
- 20:16 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 20:16 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 20:15 taavi@deploy2002: Finished scap: Backport for Wordmarks for Wikinews projects (T341258), Update wikiquote wordmarks (T341260), Update README clarifying the use of local images. (duration: 10m 04s)
- 20:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:15 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 20:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 20:14 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS bullseye
- 20:14 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2016.codfw.wmnet with OS bullseye
- 20:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P52660 and previous config saved to /var/cache/conftool/dbconfig/20230926-201256-arnaudb.json
- 20:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1016.eqiad.wmnet with reason: host reimage
- 20:09 taavi@deploy2002: taavi and jdlrobson: Continuing with sync
- 20:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1016.eqiad.wmnet with reason: host reimage
- 20:06 taavi@deploy2002: taavi and jdlrobson: Backport for Wordmarks for Wikinews projects (T341258), Update wikiquote wordmarks (T341260), Update README clarifying the use of local images. synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-exp
- 20:06 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
- 20:06 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
- 20:05 taavi@deploy2002: Started scap: Backport for Wordmarks for Wikinews projects (T341258), Update wikiquote wordmarks (T341260), Update README clarifying the use of local images.
- 20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 20:04 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
- 20:04 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
- 20:02 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 20:02 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 20:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1015.eqiad.wmnet with reason: host reimage
- 20:02 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 20:01 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 20:01 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 20:00 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 20:00 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 20:00 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:59 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:59 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1015.eqiad.wmnet with reason: host reimage
- 19:57 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 19:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P52659 and previous config saved to /var/cache/conftool/dbconfig/20230926-195750-arnaudb.json
- 19:57 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 19:55 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 19:54 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 19:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
- 19:53 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 19:52 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 19:48 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS bullseye
- 19:47 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
- 19:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2015.codfw.wmnet
- 19:47 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2015.codfw.wmnet
- 19:46 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
- 19:46 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
- 19:46 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
- 19:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2015.codfw.wmnet with OS bullseye
- 19:45 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
- 19:42 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
- 19:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1023.eqiad.wmnet with OS bullseye
- 19:40 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 19:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 19:37 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
- 19:37 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
- 19:33 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
- 19:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc1015.eqiad.wmnet with OS bullseye
- 19:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc1016.eqiad.wmnet with OS bullseye
- 19:33 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
- 19:32 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 19:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 19:31 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
- 19:30 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
- 19:27 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:27 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1023.eqiad.wmnet with reason: host reimage
- 19:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P52657 and previous config saved to /var/cache/conftool/dbconfig/20230926-191904-arnaudb.json
- 19:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 19:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 19:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52656 and previous config saved to /var/cache/conftool/dbconfig/20230926-191843-arnaudb.json
- 19:18 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1023.eqiad.wmnet with reason: host reimage
- 19:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:16 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2015.codfw.wmnet with reason: host reimage
- 19:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P52655 and previous config saved to /var/cache/conftool/dbconfig/20230926-190336-arnaudb.json
- 19:02 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 19:02 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 19:02 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2015.codfw.wmnet with reason: host reimage
- 18:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
- 18:58 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 18:54 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 18:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1020.eqiad.wmnet with OS bullseye
- 18:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 18:48 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 18:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P52654 and previous config saved to /var/cache/conftool/dbconfig/20230926-184830-arnaudb.json
- 18:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 18:47 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2015.codfw.wmnet with OS bullseye
- 18:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2024.codfw.wmnet
- 18:46 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2024.codfw.wmnet
- 18:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1017.eqiad.wmnet with OS bullseye
- 18:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 18:45 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 18:41 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 18:40 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 18:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52653 and previous config saved to /var/cache/conftool/dbconfig/20230926-183323-arnaudb.json
- 18:32 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
- 18:30 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.28 refs T345889
- 18:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1017.eqiad.wmnet with reason: host reimage
- 18:28 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
- 18:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1017.eqiad.wmnet with reason: host reimage
- 18:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
- 18:18 brennen: train 1.41.0-wmf.28 (T345889): no current blockers, rolling to group0
- 18:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS bullseye
- 18:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
- 18:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1020']
- 18:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 18:03 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1017']
- 18:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020']
- 18:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017']
- 18:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1021.eqiad.wmnet with OS bullseye
- 18:01 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 17:58 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 17:58 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 17:52 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@94ac23e]: tune parallelism of process_sparql_query_hourly (duration: 00m 27s)
- 17:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52652 and previous config saved to /var/cache/conftool/dbconfig/20230926-175222-arnaudb.json
- 17:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 17:52 ebernhardson@deploy2002: Started deploy [airflow-dags/search@94ac23e]: tune parallelism of process_sparql_query_hourly
- 17:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 17:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P52651 and previous config saved to /var/cache/conftool/dbconfig/20230926-175201-arnaudb.json
- 17:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2024.codfw.wmnet with reason: host reimage
- 17:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2024.codfw.wmnet with reason: host reimage
- 17:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P52650 and previous config saved to /var/cache/conftool/dbconfig/20230926-173653-arnaudb.json
- 17:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS bullseye
- 17:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P52649 and previous config saved to /var/cache/conftool/dbconfig/20230926-172146-arnaudb.json
- 17:15 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:15 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pyrra.svc records - herron@cumin1001"
- 17:14 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pyrra.svc records - herron@cumin1001"
- 17:12 herron@cumin1001: START - Cookbook sre.dns.netbox
- 17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P52648 and previous config saved to /var/cache/conftool/dbconfig/20230926-170639-arnaudb.json
- 17:01 bblack: A:swift-fe-codfw: manually rolling systemctl restart of swift-proxy and nginx
- 16:59 bblack@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
- 16:53 bblack@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
- 16:52 bblack: ms-fe2009 - restart swift_dispersion_stats + swift_dispersion_stats_lowlatency services (failing in systemctl)
- 16:51 bblack@cumin1001: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=1) rolling restart_daemons on A:swift-fe-codfw
- 16:45 bblack@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
- 16:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
- 16:28 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:27 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
- 16:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P52647 and previous config saved to /var/cache/conftool/dbconfig/20230926-162609-arnaudb.json
- 16:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 16:25 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 16:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52646 and previous config saved to /var/cache/conftool/dbconfig/20230926-162547-arnaudb.json
- 16:23 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
- 16:23 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
- 16:17 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
- 16:17 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
- 16:15 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
- 16:15 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
- 16:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1021.eqiad.wmnet with reason: host reimage
- 16:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P52645 and previous config saved to /var/cache/conftool/dbconfig/20230926-161041-arnaudb.json
- 16:09 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1021.eqiad.wmnet with reason: host reimage
- 16:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1021.eqiad.wmnet with OS bullseye
- 15:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 15:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P52644 and previous config saved to /var/cache/conftool/dbconfig/20230926-155534-arnaudb.json
- 15:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
- 15:47 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1021']
- 15:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021']
- 15:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1021']
- 15:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021']
- 15:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52643 and previous config saved to /var/cache/conftool/dbconfig/20230926-154027-arnaudb.json
- 15:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
- 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
- 15:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:25 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:24 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
- 15:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2021.codfw.wmnet
- 15:24 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2021.codfw.wmnet
- 15:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:11 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:11 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:09 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 15:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@d895dde]: deploy to phab1004 for weekly updates (duration: 00m 44s)
- 15:06 brennen@deploy2002: Started deploy [phabricator/deployment@d895dde]: deploy to phab1004 for weekly updates
- 15:06 brennen@deploy2002: Finished deploy [phabricator/deployment@d895dde]: test deploy to phab2002 (duration: 00m 35s)
- 15:05 brennen@deploy2002: Started deploy [phabricator/deployment@d895dde]: test deploy to phab2002
- 15:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
- 15:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
- 15:04 ejegg: re-enabled recurring donations charge job
- 15:03 brennen: beginning routine phabricator update shortly
- 15:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
- 15:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
- 15:02 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:01 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
- 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
- 15:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:01 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt 20 - jclark@cumin1001"
- 15:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52642 and previous config saved to /var/cache/conftool/dbconfig/20230926-150056-arnaudb.json
- 15:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 15:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 15:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 15:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 15:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T343198)', diff saved to https://phabricator.wikimedia.org/P52641 and previous config saved to /var/cache/conftool/dbconfig/20230926-150028-arnaudb.json
- 15:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt 20 - jclark@cumin1001"
- 14:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
- 14:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
- 14:57 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 14:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:53 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:53 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
- 14:52 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
- 14:50 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 14:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:47 moritzm: installing lldpd security updates
- 14:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2021.codfw.wmnet with OS bullseye
- 14:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P52640 and previous config saved to /var/cache/conftool/dbconfig/20230926-144521-arnaudb.json
- 14:38 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
- 14:38 effie: Rump up traffic to mw-on-k8s to 6.5% - T346422
- 14:37 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
- 14:36 ejegg: fundraising civicrm upgraded from 9efea665 to 41a4c2cf
- 14:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:34 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "puppetserver2002.codfw.wmnet - jbond@cumin2002"
- 14:33 ejegg: disabled recurring donations charge job for civi deploy
- 14:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P52639 and previous config saved to /var/cache/conftool/dbconfig/20230926-143015-arnaudb.json
- 14:27 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "puppetserver2002.codfw.wmnet - jbond@cumin2002"
- 14:25 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2002.codfw.wmnet with OS bookworm
- 14:25 jbond@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
- 14:24 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
- 14:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver1003.eqiad.wmnet with OS bookworm
- 14:23 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
- 14:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
- 14:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
- 14:17 moritzm: prune obsolete nginx packages from durum hosts after migration to new library scheme T329529
- 14:16 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
- 14:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T343198)', diff saved to https://phabricator.wikimedia.org/P52638 and previous config saved to /var/cache/conftool/dbconfig/20230926-141508-arnaudb.json
- 14:13 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2002.codfw.wmnet with reason: host reimage
- 14:10 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2002.codfw.wmnet with reason: host reimage
- 14:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 14:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
- 14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1003.eqiad.wmnet with reason: host reimage
- 14:02 Lucas_WMDE: UTC afternoon backport+config window done
- 14:02 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enable Minerva site notice for wikifunctions wiki (T345463) (duration: 09m 51s)
- 14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1003.eqiad.wmnet with reason: host reimage
- 14:01 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2021.codfw.wmnet with reason: host reimage
- 13:57 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2021.codfw.wmnet with reason: host reimage
- 13:55 lucaswerkmeister-wmde@deploy2002: ammarpad and lucaswerkmeister-wmde: Continuing with sync
- 13:54 lucaswerkmeister-wmde@deploy2002: ammarpad and lucaswerkmeister-wmde: Backport for Enable Minerva site notice for wikifunctions wiki (T345463) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:52 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enable Minerva site notice for wikifunctions wiki (T345463)
- 13:51 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for arwikisource: Increase autoconfirm edit count to 10 (T347264) (duration: 11m 27s)
- 13:47 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 ammarpad and lucaswerkmeister-wmde: Continuing with sync [originally 13:44 UTC]
- 13:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2021.codfw.wmnet with OS bullseye
- 13:43 lucaswerkmeister-wmde@deploy2002: ammarpad and lucaswerkmeister-wmde: Backport for arwikisource: Increase autoconfirm edit count to 10 (T347264) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:43 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2019.codfw.wmnet
- 13:43 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2019.codfw.wmnet
- 13:39 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for arwikisource: Increase autoconfirm edit count to 10 (T347264)
- 13:37 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for add search update pipeline streams (update + fetch_error) (T317609) (duration: 11m 54s)
- 13:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 13:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 13:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 13:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
- 13:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 13:34 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
- 13:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2019.codfw.wmnet with OS bullseye
- 13:31 lucaswerkmeister-wmde@deploy2002: pfischer and lucaswerkmeister-wmde: Continuing with sync
- 13:29 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1003.eqiad.wmnet with OS bookworm
- 13:27 lucaswerkmeister-wmde@deploy2002: pfischer and lucaswerkmeister-wmde: Backport for add search update pipeline streams (update + fetch_error) (T317609) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:25 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for add search update pipeline streams (update + fetch_error) (T317609)
- 13:25 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2002.codfw.wmnet with OS bookworm
- 13:25 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver2002.codfw.wmnet on all recursors
- 13:25 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetserver2002.codfw.wmnet on all recursors
- 13:25 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:25 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
- 13:24 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
- 13:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T343198)', diff saved to https://phabricator.wikimedia.org/P52637 and previous config saved to /var/cache/conftool/dbconfig/20230926-132357-arnaudb.json
- 13:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
- 13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
- 13:22 jbond@cumin1001: START - Cookbook sre.dns.netbox
- 13:21 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Make wikifunctionswiki a multilingual Wikidata client (T342857) (duration: 09m 44s)
- 13:18 aokoth@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
- 13:15 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
- 13:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
- 13:14 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
- 13:13 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Make wikifunctionswiki a multilingual Wikidata client (T342857) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:11 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Make wikifunctionswiki a multilingual Wikidata client (T342857)
- 13:07 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
- 13:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2019.codfw.wmnet with reason: host reimage
- 13:06 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
- 13:04 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 13:04 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2019.codfw.wmnet with reason: host reimage
- 13:04 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 13:02 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 13:02 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 13:01 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 13:01 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 13:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
- 13:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
- 13:00 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.failover (exit_code=93) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
- 13:00 jbond@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
- 12:57 jbond@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
- 12:55 jbond@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver1003
- 12:54 aokoth@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
- 12:53 jbond@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1003
- 12:53 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
- 12:53 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
- 12:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:52 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
- 12:52 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
- 12:52 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
- 12:52 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
- 12:49 jbond@cumin1001: START - Cookbook sre.dns.netbox
- 12:48 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS bullseye
- 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 12:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 12:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb2002.codfw.wmnet
- 12:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster1004.eqiad.wmnet
- 12:16 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:16 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
- 12:15 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
- 12:12 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb2002.codfw.wmnet
- 12:12 jbond@cumin1001: START - Cookbook sre.dns.netbox
- 12:10 taavi: deploy https://gerrit.wikimedia.org/r/961054 via homer
- 12:10 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts puppetmaster2004.codfw.wmnet
- 12:10 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 12:08 jbond@cumin2002: START - Cookbook sre.dns.netbox
- 12:05 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetmaster1004.eqiad.wmnet
- 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52635 and previous config saved to /var/cache/conftool/dbconfig/20230926-120417-arnaudb.json
- 12:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 12:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 12:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T343198)', diff saved to https://phabricator.wikimedia.org/P52634 and previous config saved to /var/cache/conftool/dbconfig/20230926-120355-arnaudb.json
- 12:00 jbond@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster2004.codfw.wmnet
- 11:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P52633 and previous config saved to /var/cache/conftool/dbconfig/20230926-114848-arnaudb.json
- 11:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P52632 and previous config saved to /var/cache/conftool/dbconfig/20230926-113340-arnaudb.json
- 11:29 taavi@deploy2002: Finished scap: Backport for wikitech: $wgPasswordResetRoutes takes an empty array, not false (duration: 07m 28s)
- 11:23 taavi@deploy2002: taavi: Continuing with sync
- 11:23 taavi@deploy2002: taavi: Backport for wikitech: $wgPasswordResetRoutes takes an empty array, not false synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 11:21 taavi@deploy2002: Started scap: Backport for wikitech: $wgPasswordResetRoutes takes an empty array, not false
- 11:18 taavi@deploy2002: Finished scap: Backport for wikitech: Properly disable password resets (T345226) (duration: 08m 00s)
- 11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T343198)', diff saved to https://phabricator.wikimedia.org/P52631 and previous config saved to /var/cache/conftool/dbconfig/20230926-111834-arnaudb.json
- 11:12 taavi@deploy2002: taavi: Continuing with sync
- 11:12 taavi@deploy2002: taavi: Backport for wikitech: Properly disable password resets (T345226) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 11:10 taavi@deploy2002: Started scap: Backport for wikitech: Properly disable password resets (T345226)
- 11:07 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
- 11:07 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
- 10:55 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 10:55 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 10:54 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 10:53 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 10:51 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 10:51 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 10:46 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 10:46 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 10:46 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 10:46 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 10:41 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 10:41 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 10:40 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 10:40 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 10:39 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 10:38 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 10:38 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 10:38 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 10:37 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 10:37 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Downtiming host for RAID controller battery replacement
- 10:37 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Downtiming host for RAID controller battery replacement
- 10:36 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 10:35 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 10:05 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
- 10:05 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
- 10:04 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
- 10:04 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
- 10:04 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
- 10:03 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
- 10:03 taavi: update CR firewall policy to permit wiki replica account creation in the new cloud-private network setup, https://gerrit.wikimedia.org/r/961055 T347381
- 10:03 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
- 10:02 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
- 10:01 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
- 10:00 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
- 10:00 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
- 10:00 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
- 09:54 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 09:53 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 09:52 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 09:52 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 09:48 godog: remove per-host restbase healthchecks, replaced by service-level swagger-exporter checks - T314118
- 09:47 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
- 09:47 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
- 09:38 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 09:38 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
- 09:37 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
- 09:36 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 09:36 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 09:35 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 09:35 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 09:35 claime: Raised replicas to 20 for mw-api-ext and mw-web - T346422
- 09:35 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
- 09:34 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 09:34 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 09:34 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 09:34 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
- 09:33 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
- 09:33 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
- 09:30 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
- 09:29 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
- 09:29 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
- 09:28 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
- 09:27 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
- 09:26 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
- 09:25 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
- 09:23 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 09:23 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
- 09:22 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 09:22 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
- 09:22 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 09:21 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
- 09:20 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
- 09:20 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 09:19 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 09:19 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 09:18 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 09:17 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
- 09:17 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 09:16 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
- 09:16 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
- 09:15 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 09:15 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
- 09:15 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
- 09:15 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
- 09:15 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
- 09:14 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
- 09:14 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
- 09:13 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
- 09:13 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 09:13 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
- 09:12 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 09:09 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
- 09:08 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
- 08:52 taavi@deploy2002: taavi: Continuing with sync
- 08:52 taavi@deploy2002: taavi: Backport for wikitech: Disable password resets (T345226), wikitech: Block account creation by sysops too (T345226) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 08:51 taavi@deploy2002: Started scap: Backport for wikitech: Disable password resets (T345226), wikitech: Block account creation by sysops too (T345226)
- 08:03 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.eqiad.wmnet
- 07:56 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.eqiad.wmnet
- 07:55 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.eqiad.wmnet with OS bullseye
- 07:55 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
- 07:54 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
- 07:45 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1007 - taavi@cumin1001"
- 07:44 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1007 - taavi@cumin1001"
- 07:25 taavi@deploy2002: Finished scap: Backport for Enable wgMinervaEnableSiteNotice for knwiki (T346582), guwikisource: add audiobook namespace (T347189), add throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T346043) (duration: 11m 41s)
- 07:18 taavi@deploy2002: anzx and taavi: Continuing with sync
- 07:15 taavi@deploy2002: anzx and taavi: Backport for Enable wgMinervaEnableSiteNotice for knwiki (T346582), guwikisource: add audiobook namespace (T347189), add throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T346043) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug k
- 07:13 taavi@deploy2002: Started scap: Backport for Enable wgMinervaEnableSiteNotice for knwiki (T346582), guwikisource: add audiobook namespace (T347189), add throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T346043)
- 07:08 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
- 07:05 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
- 06:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 06:57 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 06:56 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 06:42 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.eqiad.wmnet with OS bullseye
- 04:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
- 03:54 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.26 (duration: 02m 13s)
- 03:52 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.28 refs T345889 (duration: 49m 31s)
- 03:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
- 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.28 refs T345889
- 02:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1232.eqiad.wmnet with OS bullseye
- 02:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS bullseye
- 02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1233.eqiad.wmnet with OS bullseye
- 02:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1230.eqiad.wmnet with OS bullseye
- 02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1226.eqiad.wmnet with OS bullseye
- 02:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1232.eqiad.wmnet with reason: host reimage
- 02:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage
- 02:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
- 02:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1230.eqiad.wmnet with reason: host reimage
- 02:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
- 02:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1232.eqiad.wmnet with reason: host reimage
- 02:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage
- 02:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1230.eqiad.wmnet with reason: host reimage
- 02:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
- 02:11 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
- 02:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1233.eqiad.wmnet with OS bullseye
- 02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1232.eqiad.wmnet with OS bullseye
- 02:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS bullseye
- 02:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1230.eqiad.wmnet with OS bullseye
- 02:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
- 02:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1228.eqiad.wmnet with OS bullseye
- 02:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS bullseye
- 01:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1226.eqiad.wmnet with OS bullseye
- food: payments-wiki upgraded from 5596c7fd to 358e616e
- 01:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
- 01:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
- 01:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 01:19 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 01:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
- 01:19 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
- 01:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 01:18 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 01:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T343198)', diff saved to https://phabricator.wikimedia.org/P52628 and previous config saved to /var/cache/conftool/dbconfig/20230926-011707-arnaudb.json
- 01:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 01:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 01:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 01:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 01:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T343198)', diff saved to https://phabricator.wikimedia.org/P52627 and previous config saved to /var/cache/conftool/dbconfig/20230926-011629-arnaudb.json
- 01:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P52626 and previous config saved to /var/cache/conftool/dbconfig/20230926-010123-arnaudb.json
- 00:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P52625 and previous config saved to /var/cache/conftool/dbconfig/20230926-004616-arnaudb.json
- 00:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 00:40 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 00:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T343198)', diff saved to https://phabricator.wikimedia.org/P52624 and previous config saved to /var/cache/conftool/dbconfig/20230926-003109-arnaudb.json
- 00:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1024.eqiad.wmnet with OS bullseye
- 00:30 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:29 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1022.eqiad.wmnet with OS bullseye
- 00:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1019.eqiad.wmnet with OS bullseye
- 00:26 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:25 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1018.eqiad.wmnet with OS bullseye
- 00:24 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
- 00:09 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1022.eqiad.wmnet with reason: host reimage
- 00:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
- 00:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1024.eqiad.wmnet with reason: host reimage
- 00:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
- 00:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
2023-09-25
- 23:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:50 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1022.eqiad.wmnet with OS bullseye
- 23:48 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1022']
- 23:45 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:45 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
- 23:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1024.eqiad.wmnet with OS bullseye
- 23:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1019.eqiad.wmnet with OS bullseye
- 23:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
- 23:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1019']
- 23:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018']
- 23:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 23:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
- 23:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1024']
- 23:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:42 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:42 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1021']
- 23:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1017']
- 23:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:37 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:37 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1019']
- 23:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018']
- 23:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:35 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
- 23:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1019']
- 23:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1018']
- 23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1024']
- 23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
- 23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021']
- 23:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1019']
- 23:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018']
- 23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1024.eqiad.wmnet with OS bullseye
- 23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1023.eqiad.wmnet with OS bullseye
- 23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1022.eqiad.wmnet with OS bullseye
- 23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
- 23:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017']
- 23:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1017']
- 23:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017']
- 23:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1019.eqiad.wmnet with OS bullseye
- 23:31 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
- 23:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 22:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1024.eqiad.wmnet with OS bullseye
- 22:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
- 22:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1022.eqiad.wmnet with OS bullseye
- 22:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
- 22:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1019.eqiad.wmnet with OS bullseye
- 22:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
- 22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1019.eqiad.wmnet']
- 22:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023.eqiad.wmnet']
- 22:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
- 22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1021.eqiad.wmnet']
- 22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1024.eqiad.wmnet']
- 22:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022']
- 22:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
- 22:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022']
- 22:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
- 22:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022']
- 22:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
- 22:03 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1019.eqiad.wmnet']
- 22:01 dancy@deploy2002: Finished scap: final test sync (duration: 15m 00s)
- 21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
- 21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
- 21:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 21:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
- 21:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023.eqiad.wmnet']
- 21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
- 21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1024.eqiad.wmnet']
- 21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
- 21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021.eqiad.wmnet']
- 21:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1022.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1019.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:46 dancy@deploy2002: Started scap: final test sync
- 21:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1018.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:45 dancy@deploy2002: Started scap: testing scap mods
- 21:38 dancy@deploy2002: Started scap: testing scap mods
- 21:37 dancy@deploy2002: Installation of scap version "4.62.0" completed for 598 hosts
- 21:36 dancy@deploy2002: Installing scap version "4.62.0" for 598 hosts
- 21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:31 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1019.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1018.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:30 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:30 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
- 21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1022.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:29 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
- 21:27 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 21:22 dancy@deploy2002: Started scap: testing scap mods
- 21:20 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:20 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
- 21:19 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
- 21:17 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 21:12 cjming: end of UTC late backport window
- 21:02 cjming@deploy2002: Finished scap: Backport for Provide wordmarks/taglines for Wikibooks projects (T341251), Fix white background for Wikibooks wordmarks (T341251), Icons for special projects (T341242) (duration: 23m 50s)
- 20:53 cjming@deploy2002: pikne and cjming and jdlrobson: Continuing with sync
- 20:51 cjming@deploy2002: pikne and cjming and jdlrobson: Backport for Provide wordmarks/taglines for Wikibooks projects (T341251), Fix white background for Wikibooks wordmarks (T341251), Icons for special projects (T341242) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes
- 20:39 cjming@deploy2002: Started scap: Backport for Provide wordmarks/taglines for Wikibooks projects (T341251), Fix white background for Wikibooks wordmarks (T341251), Icons for special projects (T341242)
- 20:25 cjming@deploy2002: Finished scap: Backport for Deploy Reader Demographics 2 pilot survey (T345951) (duration: 21m 18s)
- 20:16 cjming@deploy2002: cjming and dani: Continuing with sync
- 20:15 cjming@deploy2002: cjming and dani: Backport for Deploy Reader Demographics 2 pilot survey (T345951) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:03 cjming@deploy2002: Started scap: Backport for Deploy Reader Demographics 2 pilot survey (T345951)
- 18:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
- 18:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
- 18:36 ejegg: Standalone (payments listener) SmashPig upgraded from 0703ce60 to a78a91d9
- 16:54 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2010.codfw.wmnet
- 16:51 jayme: uncordon kubernetes2010.codfw.wmnet - T347267
- 16:11 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 16:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 16:09 sukhe@cumin2002: dbctl commit (dc=all): 'Depool db2109', diff saved to https://phabricator.wikimedia.org/P52622 and previous config saved to /var/cache/conftool/dbconfig/20230925-160904-sukhe.json
- 16:01 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 15:57 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 15:55 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 15:54 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
- 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
- 15:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
- 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
- 15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
- 15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
- 15:30 ejegg: Standalone (payments listener) SmashPig upgraded from 2412df22 to 0703ce60
- 15:23 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:23 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new records for cloudcontrol1007 - cmooney@cumin1001"
- 15:23 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new records for cloudcontrol1007 - cmooney@cumin1001"
- 15:22 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1007
- 15:21 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1007
- 15:21 herron: alert[12]001 -- rm /etc/apache2/sites-available/50-dispatch-wikimedia-org.conf && apachectl graceful T344937
- 15:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P52621 and previous config saved to /var/cache/conftool/dbconfig/20230925-152043-ladsgroup.json
- 15:19 herron: alert[12]001 -- apt remove docker.io T344937
- 15:17 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:17 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1007 - taavi@cumin1001"
- 15:16 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1007 - taavi@cumin1001"
- 15:14 taavi@cumin1001: START - Cookbook sre.dns.netbox
- 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P52620 and previous config saved to /var/cache/conftool/dbconfig/20230925-150536-ladsgroup.json
- 15:00 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:57 moritzm: installing python3.7 security updates
- 14:53 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
- 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P52619 and previous config saved to /var/cache/conftool/dbconfig/20230925-145029-ladsgroup.json
- 14:46 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
- 14:46 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
- 14:45 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
- 14:45 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
- 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
- 14:43 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet
- 14:43 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:43 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
- 14:39 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:39 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:38 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:38 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:37 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:36 jayme@deploy2002: Finished scap: (no justification provided) (duration: 03m 09s)
- 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P52618 and previous config saved to /var/cache/conftool/dbconfig/20230925-143523-ladsgroup.json
- 14:35 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:34 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:34 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:33 jayme@deploy2002: Started scap: (no justification provided)
- 14:33 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:32 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 14:32 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:31 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
- 14:31 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:31 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 14:31 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 14:30 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 14:29 herron@cumin1001: START - Cookbook sre.dns.netbox
- 14:29 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 14:28 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 14:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:25 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 14:25 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 14:24 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 14:24 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 14:24 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 14:24 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet
- 14:22 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:22 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:22 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:21 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:20 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:20 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:19 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:19 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:18 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P52615 and previous config saved to /var/cache/conftool/dbconfig/20230925-141313-ladsgroup.json
- 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
- 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
- 14:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T343198)', diff saved to https://phabricator.wikimedia.org/P52614 and previous config saved to /var/cache/conftool/dbconfig/20230925-141252-arnaudb.json
- 14:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 14:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 14:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T343198)', diff saved to https://phabricator.wikimedia.org/P52613 and previous config saved to /var/cache/conftool/dbconfig/20230925-141230-arnaudb.json
- 14:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
- 14:04 urbanecm@deploy2002: Finished scap: Backport for listTaskCounts: Do not expect tasks key to be present (T347120), AddImageFeedbackHandler: Add missing parameters (T346277), Enable Parsoid support for Kartographer on enwikivoyage (T342871) (duration: 38m 35s)
- 14:00 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
- 13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59278
- 13:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
- 13:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P52612 and previous config saved to /var/cache/conftool/dbconfig/20230925-135724-arnaudb.json
- 13:51 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
- 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P52611 and previous config saved to /var/cache/conftool/dbconfig/20230925-135004-ladsgroup.json
- 13:43 urbanecm@deploy2002: urbanecm and ihurbain: Continuing with sync
- 13:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P52610 and previous config saved to /var/cache/conftool/dbconfig/20230925-134217-arnaudb.json
- 13:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
- 13:38 urbanecm@deploy2002: urbanecm and ihurbain: Backport for listTaskCounts: Do not expect tasks key to be present (T347120), AddImageFeedbackHandler: Add missing parameters (T346277), Enable Parsoid support for Kartographer on enwikivoyage (T342871) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.w
- 13:36 jayme@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.codfw.wmnet
- 13:36 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,name=kubernetes.*
- 13:35 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
- 13:35 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,name=kubernetes.*
- 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P52607 and previous config saved to /var/cache/conftool/dbconfig/20230925-133457-ladsgroup.json
- 13:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
- 13:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T343198)', diff saved to https://phabricator.wikimedia.org/P52606 and previous config saved to /var/cache/conftool/dbconfig/20230925-132711-arnaudb.json
- 13:26 urbanecm@deploy2002: Started scap: Backport for listTaskCounts: Do not expect tasks key to be present (T347120), AddImageFeedbackHandler: Add missing parameters (T346277), Enable Parsoid support for Kartographer on enwikivoyage (T342871)
- 13:25 urbanecm@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink backend 14th round of wikis (T308139) (duration: 23m 28s)
- 13:22 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
- 13:22 jayme@cumin1001: conftool action : set/weight=10; selector: service=kubesvc,cluster=kubernetes,dc=codfw
- 13:21 jayme@cumin1001: conftool action : set/weight=10; selector: service=kubesvc,cluster=kubernetes,dc=eqiad
- 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P52605 and previous config saved to /var/cache/conftool/dbconfig/20230925-131951-ladsgroup.json
- 13:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
- 13:15 urbanecm@deploy2002: urbanecm and sgimeno: Continuing with sync
- 13:14 jayme: ran homer "lsw1-*eqiad*" commit - T346714
- 13:14 urbanecm@deploy2002: urbanecm and sgimeno: Backport for GrowthExperiments: enable AddLink backend 14th round of wikis (T308139) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:13 jayme: uncordoned kubernetes10[27-56]
- 13:11 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
- 13:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint1002.eqiad.wmnet
- 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P52604 and previous config saved to /var/cache/conftool/dbconfig/20230925-130444-ladsgroup.json
- 13:04 moritzm: installing openjdk-11 security updates on buster
- 13:03 jayme: cordoned kubernetes10[27-56]
- 13:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59278
- 13:01 urbanecm@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink backend 14th round of wikis (T308139)
- 13:01 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint1002.eqiad.wmnet
- 12:56 kamila_: put codfw before eqiad in geoDNS defaults
- 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P52603 and previous config saved to /var/cache/conftool/dbconfig/20230925-125212-ladsgroup.json
- 12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
- 12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
- 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1024-1025].eqiad.wmnet with reason: Maintenance
- 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es[1024-1025].eqiad.wmnet with reason: Maintenance
- 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
- 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
- 12:26 jayme@deploy2002: Finished scap: (no justification provided) (duration: 10m 08s)
- 12:17 jayme: bumping k8s deployment mw-web and mw-api-ext to 16 replicas each in both DCs
- 12:16 jayme@deploy2002: Started scap: (no justification provided)
- 11:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
- 11:43 jayme: running puppet on lvs in eqiad - T346714 (TYPO from above, did not run in codfw)
- 11:42 jayme: running puppet on lvs in codfw - T346714
- 11:37 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
- 11:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1054.eqiad.wmnet with OS bullseye
- 11:20 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1013.eqiad.wmnet
- 11:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1014.eqiad.wmnet
- 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1036.eqiad.wmnet with OS bullseye
- 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1053.eqiad.wmnet with OS bullseye
- 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1047.eqiad.wmnet with OS bullseye
- 11:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1014.eqiad.wmnet
- 11:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1038.eqiad.wmnet with OS bullseye
- 11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1050.eqiad.wmnet with OS bullseye
- 11:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1046.eqiad.wmnet with OS bullseye
- 11:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
- 11:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1049.eqiad.wmnet with OS bullseye
- 11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1044.eqiad.wmnet with OS bullseye
- 11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1048.eqiad.wmnet with OS bullseye
- 11:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
- 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T344589)', diff saved to https://phabricator.wikimedia.org/P52602 and previous config saved to /var/cache/conftool/dbconfig/20230925-110343-ladsgroup.json
- 11:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1056.eqiad.wmnet with OS bullseye
- 11:00 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
- 10:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1043.eqiad.wmnet with OS bullseye
- 10:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
- 10:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1045.eqiad.wmnet with OS bullseye
- 10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1055.eqiad.wmnet with OS bullseye
- 10:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
- 10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1051.eqiad.wmnet with OS bullseye
- 10:54 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
- 10:53 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
- 10:53 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1054.eqiad.wmnet with reason: host reimage
- 10:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
- 10:52 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
- 10:50 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
- 10:50 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
- 10:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
- 10:49 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
- 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P52601 and previous config saved to /var/cache/conftool/dbconfig/20230925-104837-ladsgroup.json
- 10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1034.eqiad.wmnet with OS bullseye
- 10:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
- 10:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
- 10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1055.eqiad.wmnet with reason: host reimage
- 10:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
- 10:47 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
- 10:47 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
- 10:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1037.eqiad.wmnet with OS bullseye
- 10:45 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
- 10:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
- 10:45 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
- 10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
- 10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1054.eqiad.wmnet with reason: host reimage
- 10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1055.eqiad.wmnet with reason: host reimage
- 10:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1039.eqiad.wmnet with OS bullseye
- 10:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
- 10:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
- 10:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
- 10:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1033.eqiad.wmnet with OS bullseye
- 10:40 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
- 10:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1035.eqiad.wmnet with OS bullseye
- 10:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
- 10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
- 10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
- 10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
- 10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
- 10:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1041.eqiad.wmnet with OS bullseye
- 10:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
- 10:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
- 10:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
- 10:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
- 10:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1040.eqiad.wmnet with OS bullseye
- 10:36 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1042.eqiad.wmnet with OS bullseye
- 10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1128', diff saved to https://phabricator.wikimedia.org/P52600 and previous config saved to /var/cache/conftool/dbconfig/20230925-103454-root.json
- 10:34 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
- 10:34 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
- 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P52599 and previous config saved to /var/cache/conftool/dbconfig/20230925-103330-ladsgroup.json
- 10:31 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
- 10:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
- 10:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
- 10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1055.eqiad.wmnet with OS bullseye
- 10:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
- 10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1056.eqiad.wmnet with OS bullseye
- 10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1054.eqiad.wmnet with OS bullseye
- 10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1053.eqiad.wmnet with OS bullseye
- 10:26 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1051.eqiad.wmnet with OS bullseye
- 10:26 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1050.eqiad.wmnet with OS bullseye
- 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1049.eqiad.wmnet with OS bullseye
- 10:25 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1047.eqiad.wmnet with OS bullseye
- 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1048.eqiad.wmnet with OS bullseye
- 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
- 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1046.eqiad.wmnet with OS bullseye
- 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1045.eqiad.wmnet with OS bullseye
- 10:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
- 10:24 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1044.eqiad.wmnet with OS bullseye
- 10:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
- 10:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1043.eqiad.wmnet with OS bullseye
- 10:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
- 10:22 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
- 10:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
- 10:20 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
- 10:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
- 10:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
- 10:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
- 10:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
- 10:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
- 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T344589)', diff saved to https://phabricator.wikimedia.org/P52597 and previous config saved to /var/cache/conftool/dbconfig/20230925-101824-ladsgroup.json
- 10:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
- 10:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
- 10:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
- 10:09 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
- 10:09 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
- 10:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
- 10:08 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
- 10:05 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1042.eqiad.wmnet with OS bullseye
- 10:05 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1041.eqiad.wmnet with OS bullseye
- 10:04 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
- 10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1040.eqiad.wmnet with OS bullseye
- 10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1039.eqiad.wmnet with OS bullseye
- 10:04 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
- 10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
- 10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1037.eqiad.wmnet with OS bullseye
- 10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
- 10:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1035.eqiad.wmnet with OS bullseye
- 10:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
- 10:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1033.eqiad.wmnet with OS bullseye
- 09:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1031.eqiad.wmnet with OS bullseye
- 09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1032.eqiad.wmnet with OS bullseye
- 09:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1030.eqiad.wmnet with OS bullseye
- 09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1022 (T344589)', diff saved to https://phabricator.wikimedia.org/P52596 and previous config saved to /var/cache/conftool/dbconfig/20230925-095235-ladsgroup.json
- 09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
- 09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
- 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1021-1022].eqiad.wmnet with reason: Maintenance
- 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es[1021-1022].eqiad.wmnet with reason: Maintenance
- 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
- 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
- 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
- 09:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
- 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
- 09:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
- 09:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
- 09:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
- 09:38 jelto: switch people.wikimedia.org to codfw - T345618
- 09:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
- 09:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
- 09:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
- 09:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
- 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db[1137,1216,1220,1225].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Maintenance
- 09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db[1137,1216,1220,1225].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Maintenance
- 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 17 hosts with reason: Maintenance
- 09:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 17 hosts with reason: Maintenance
- 09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
- 09:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
- 09:20 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1032.eqiad.wmnet with OS bullseye
- 09:19 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1031.eqiad.wmnet with OS bullseye
- 09:19 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1030.eqiad.wmnet with OS bullseye
- 09:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
- 09:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
- 09:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
- 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
- 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
- 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 13 hosts with reason: Maintenance
- 09:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 13 hosts with reason: Maintenance
- 09:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 09:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 08:59 Amir1: by the power vested in my be Chris Albon and ML team, I now pronounce ORES dead.
- 08:58 elukey: migrate ores.wikimedia.org's ATS backend to ores-legacy.discovery.wmnet (k8s app) - This will drain traffic to ORES bare metal nodes - T341696
- 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 15 hosts with reason: Maintenance
- 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 15 hosts with reason: Maintenance
- 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 08:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 16 hosts with reason: Schema change
- 08:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 16 hosts with reason: Schema change
- 08:43 jayme: jayme@cumin1001 conftool action : set/pooled=no; selector: name=kubernetes2010.* - T347267
- 08:43 jayme@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.*
- 08:39 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: host is down
- 08:39 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: host is down
- 08:27 jayme: draining kubernetes2010.codfw.wmnet - T347267
- 08:01 jayme: cordoning kubernetes2010
- 07:49 taavi: drop cloudmetrics exceptions from cr firewall ACLs https://gerrit.wikimedia.org/r/c/operations/homer/public/+/960027 T326266
- 07:47 taavi@deploy2002: Finished scap: Backport for Make sure different key values are handled while submitting (T345496) (duration: 30m 55s)
- 07:38 taavi@deploy2002: taavi and soda: Continuing with sync
- 07:37 XioNoX: update eqsin-ulsfo tranport link ospf metrics to match the new latency of 175ms
- 07:29 taavi@deploy2002: taavi and soda: Backport for Make sure different key values are handled while submitting (T345496) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:22 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 07:20 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 07:16 taavi@deploy2002: Started scap: Backport for Make sure different key values are handled while submitting (T345496)
- 07:06 XioNoX: roll out "Block inbound RAs on the routers" - T334916
- 06:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 35008
- 06:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 35008
- 05:27 kart_: Updated cxserver to 2023-09-13-074325-production (T346045)
- 05:27 kart_: Updated cxserver to 2023-09-13-074325-production (T346045)
- 05:22 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 05:22 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 05:13 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 05:12 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 05:08 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 05:08 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
2023-09-24
- 23:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T343198)', diff saved to https://phabricator.wikimedia.org/P52595 and previous config saved to /var/cache/conftool/dbconfig/20230924-230515-arnaudb.json
- 23:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
- 23:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
- 23:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T343198)', diff saved to https://phabricator.wikimedia.org/P52594 and previous config saved to /var/cache/conftool/dbconfig/20230924-230443-arnaudb.json
- 22:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P52593 and previous config saved to /var/cache/conftool/dbconfig/20230924-224936-arnaudb.json
- 22:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P52592 and previous config saved to /var/cache/conftool/dbconfig/20230924-223430-arnaudb.json
- 22:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T343198)', diff saved to https://phabricator.wikimedia.org/P52591 and previous config saved to /var/cache/conftool/dbconfig/20230924-221923-arnaudb.json
- 10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T343198)', diff saved to https://phabricator.wikimedia.org/P52590 and previous config saved to /var/cache/conftool/dbconfig/20230924-102809-arnaudb.json
- 10:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
- 10:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
- 10:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T343198)', diff saved to https://phabricator.wikimedia.org/P52589 and previous config saved to /var/cache/conftool/dbconfig/20230924-102747-arnaudb.json
- 10:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P52588 and previous config saved to /var/cache/conftool/dbconfig/20230924-101241-arnaudb.json
- 09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P52587 and previous config saved to /var/cache/conftool/dbconfig/20230924-095734-arnaudb.json
- 09:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T343198)', diff saved to https://phabricator.wikimedia.org/P52586 and previous config saved to /var/cache/conftool/dbconfig/20230924-094227-arnaudb.json
2023-09-23
- 22:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T343198)', diff saved to https://phabricator.wikimedia.org/P52585 and previous config saved to /var/cache/conftool/dbconfig/20230923-222721-arnaudb.json
- 22:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
- 22:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
- 22:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T343198)', diff saved to https://phabricator.wikimedia.org/P52584 and previous config saved to /var/cache/conftool/dbconfig/20230923-222659-arnaudb.json
- 22:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P52583 and previous config saved to /var/cache/conftool/dbconfig/20230923-221152-arnaudb.json
- 21:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P52582 and previous config saved to /var/cache/conftool/dbconfig/20230923-215646-arnaudb.json
- 21:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T343198)', diff saved to https://phabricator.wikimedia.org/P52581 and previous config saved to /var/cache/conftool/dbconfig/20230923-214139-arnaudb.json
- 10:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T343198)', diff saved to https://phabricator.wikimedia.org/P52580 and previous config saved to /var/cache/conftool/dbconfig/20230923-101423-arnaudb.json
- 10:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
- 10:14 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
2023-09-22
- 22:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
- 22:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
- 17:32 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@a30e944]: (no justification provided) (duration: 00m 09s)
- 17:32 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@a30e944]: (no justification provided)
- 15:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
- 15:45 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:45 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
- 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
- 15:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:31 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:30 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:24 denisse: upgrading LibreNMS in eqiad
- 15:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1247']
- 15:19 denisse: upgrading LibreNMS to 23.9.1
- 15:13 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.9.1 - T346737 (duration: 00m 09s)
- 15:13 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.9.1 - T346737
- 15:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1247']
- 15:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc1015']
- 14:58 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1015']
- 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 12:23 brouberol@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
- 12:17 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
- 12:13 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
- 11:58 brouberol@cumin1001: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
- 11:42 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apt-staging2001.codfw.wmnet with OS bookworm
- 11:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
- 11:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
- 11:28 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apt-staging2001.codfw.wmnet with reason: host reimage
- 11:25 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on apt-staging2001.codfw.wmnet with reason: host reimage
- 11:09 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host apt-staging2001.codfw.wmnet with OS bookworm
- 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
- 10:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
- 10:00 fabfur: repool cp1090 (T346874)
- 09:53 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
- 09:50 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
- 09:45 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcumin1001.eqiad.wmnet
- 09:45 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
- 09:43 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcumin1001.eqiad.wmnet
- 09:43 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
- 09:23 Amir1: dbmaint on s2@eqiad (T343198)
- 09:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 16 hosts with reason: Schema change
- 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 16 hosts with reason: Schema change
- 09:13 moritzm: installing perf updates on bookworm hosts
- 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 15 hosts with reason: Schema change
- 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 15 hosts with reason: Schema change
- 09:06 moritzm: installing perf updates on buster hosts
- 08:51 Amir1: dbmaint on s4@eqiad (T343198)
- 08:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 20 hosts with reason: Schema change
- 08:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 20 hosts with reason: Schema change
- 07:45 hashar: Upgrading CI Jenkins from 2.401.3 to 2.414.2
- 07:36 hashar: Restarting Gerrit to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/953967 "Link account creation to IDM" # T345226
- 07:06 moritzm: installing mutt security updates
- 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1132', diff saved to https://phabricator.wikimedia.org/P52577 and previous config saved to /var/cache/conftool/dbconfig/20230922-063617-root.json
- 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P52576 and previous config saved to /var/cache/conftool/dbconfig/20230922-063212-root.json
- 05:13 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 00:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 00:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 00:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T343198)', diff saved to https://phabricator.wikimedia.org/P52575 and previous config saved to /var/cache/conftool/dbconfig/20230922-004330-arnaudb.json
- 00:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P52574 and previous config saved to /var/cache/conftool/dbconfig/20230922-002823-arnaudb.json
- 00:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P52573 and previous config saved to /var/cache/conftool/dbconfig/20230922-001316-arnaudb.json
2023-09-21
- 23:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T343198)', diff saved to https://phabricator.wikimedia.org/P52572 and previous config saved to /var/cache/conftool/dbconfig/20230921-235810-arnaudb.json
- 22:02 ejegg: Standalone (listener) SmashPig upgraded from ca5b6218 to 2412df22
- 20:28 brennen: end of UTC late backport & config window
- 20:27 brennen@deploy2002: Finished scap: Backport for Update Reader Demographics 2 pilot survey (T345951) (duration: 21m 36s)
- 20:18 brennen@deploy2002: dani and brennen: Continuing with sync
- 20:17 brennen@deploy2002: dani and brennen: Backport for Update Reader Demographics 2 pilot survey (T345951) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:06 brennen@deploy2002: Started scap: Backport for Update Reader Demographics 2 pilot survey (T345951)
- 20:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T343198)', diff saved to https://phabricator.wikimedia.org/P52570 and previous config saved to /var/cache/conftool/dbconfig/20230921-200439-arnaudb.json
- 20:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 20:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 20:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T343198)', diff saved to https://phabricator.wikimedia.org/P52569 and previous config saved to /var/cache/conftool/dbconfig/20230921-200417-arnaudb.json
- 20:01 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reords for codfw test servers - cmooney@cumin1001"
- 19:59 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reords for codfw test servers - cmooney@cumin1001"
- 19:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P52568 and previous config saved to /var/cache/conftool/dbconfig/20230921-194911-arnaudb.json
- 19:47 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 19:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P52567 and previous config saved to /var/cache/conftool/dbconfig/20230921-193404-arnaudb.json
- 19:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T343198)', diff saved to https://phabricator.wikimedia.org/P52566 and previous config saved to /var/cache/conftool/dbconfig/20230921-191858-arnaudb.json
- 19:17 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
- 19:13 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
- 18:54 ladsgroup@deploy2002: Finished scap: Backport for Enable Url shortener in sidebar in all wikis (T267921) (duration: 20m 47s)
- 18:47 ejegg: payments-wiki upgraded from 9cd3e4cd to 5596c7fd
- 18:45 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 18:45 ladsgroup@deploy2002: ladsgroup: Backport for Enable Url shortener in sidebar in all wikis (T267921) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52565 and previous config saved to /var/cache/conftool/dbconfig/20230921-184000-ladsgroup.json
- 18:34 ladsgroup@deploy2002: Started scap: Backport for Enable Url shortener in sidebar in all wikis (T267921)
- 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52564 and previous config saved to /var/cache/conftool/dbconfig/20230921-182455-ladsgroup.json
- 18:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.27 refs T345888
- 18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52562 and previous config saved to /var/cache/conftool/dbconfig/20230921-180949-ladsgroup.json
- 18:05 brennen: train 1.41.0-wmf.27 (T345888): no current blockers, logs clean, rolling to group2 shortly.
- 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1166 (T346365)', diff saved to https://phabricator.wikimedia.org/P52561 and previous config saved to /var/cache/conftool/dbconfig/20230921-180003-ladsgroup.json
- 17:59 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@ddcc518]: Deploy latest DAGs to analytics Airflow instance (duration: 00m 40s)
- 17:58 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@ddcc518]: Deploy latest DAGs to analytics Airflow instance
- 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1166 (T346365)', diff saved to https://phabricator.wikimedia.org/P52560 and previous config saved to /var/cache/conftool/dbconfig/20230921-175634-ladsgroup.json
- 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P52559 and previous config saved to /var/cache/conftool/dbconfig/20230921-175444-ladsgroup.json
- 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2149 (T346365)', diff saved to https://phabricator.wikimedia.org/P52558 and previous config saved to /var/cache/conftool/dbconfig/20230921-174934-ladsgroup.json
- 17:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2014.codfw.wmnet
- 17:41 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2014.codfw.wmnet
- 17:35 ejegg: re-enabled contribution tracking queue consumer
- 17:30 ejegg: civicrm upgraded from f0e9d3f6 to 9efea665
- 17:29 ejegg: disabled contribution_tracking queue consumer for Civi update
- 17:27 eoghan@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host apt-staging2001.codfw.wmnet
- 17:27 eoghan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apt-staging2001.codfw.wmnet with OS bookworm
- 17:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2014.codfw.wmnet with OS bullseye
- 16:45 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2014.codfw.wmnet with reason: host reimage
- 16:42 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2014.codfw.wmnet with reason: host reimage
- 16:26 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2014.codfw.wmnet with OS bullseye
- 16:11 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host apt-staging2001.codfw.wmnet with OS bookworm
- 16:10 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
- 16:10 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
- 16:10 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt-staging2001.codfw.wmnet on all recursors
- 16:09 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache apt-staging2001.codfw.wmnet on all recursors
- 16:09 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:09 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
- 16:08 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
- 16:02 eoghan@cumin1001: START - Cookbook sre.dns.netbox
- 16:02 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host apt-staging2001.codfw.wmnet
- 15:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T343198)', diff saved to https://phabricator.wikimedia.org/P52557 and previous config saved to /var/cache/conftool/dbconfig/20230921-153428-arnaudb.json
- 15:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 15:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 15:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T343198)', diff saved to https://phabricator.wikimedia.org/P52556 and previous config saved to /var/cache/conftool/dbconfig/20230921-153406-arnaudb.json
- 15:33 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 15:25 jayme@deploy2002: Finished scap: (no justification provided) (duration: 02m 29s)
- 15:22 jayme@deploy2002: Started scap: (no justification provided)
- 15:20 moritzm: installing php7.3 security updates (as packaged in Debian Buster)
- 15:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P52555 and previous config saved to /var/cache/conftool/dbconfig/20230921-151900-arnaudb.json
- 15:14 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for SpecialUndelete: Do not clone RequestContext (T346995) (duration: 34m 13s)
- 15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 22 hosts with reason: Schema change
- 15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 22 hosts with reason: Schema change
- 15:12 Amir1: dbmaint on s8@eqiad (T343198)
- 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 18 hosts with reason: Schema change
- 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 18 hosts with reason: Schema change
- 15:06 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2013.codfw.wmnet
- 15:06 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2013.codfw.wmnet
- 15:05 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 15:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P52554 and previous config saved to /var/cache/conftool/dbconfig/20230921-150353-arnaudb.json
- 15:01 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
- 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for SpecialUndelete: Do not clone RequestContext (T346995) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 14:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T343198)', diff saved to https://phabricator.wikimedia.org/P52553 and previous config saved to /var/cache/conftool/dbconfig/20230921-144847-arnaudb.json
- 14:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2013.codfw.wmnet with OS bullseye
- 14:40 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for SpecialUndelete: Do not clone RequestContext (T346995)
- 14:31 moritzm: imported cas 6.6.12+wmf11u1 to apt.wikimedia.org
- 14:31 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 14:19 tchanders@deploy2002: Finished scap: Backport for Enable partial action blocks on mediawikiwiki (T332733) (duration: 34m 01s)
- 14:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2013.codfw.wmnet with reason: host reimage
- 14:14 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2013.codfw.wmnet with reason: host reimage
- 14:07 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 14:04 tchanders@deploy2002: tchanders: Continuing with sync
- 14:03 tchanders@deploy2002: tchanders: Backport for Enable partial action blocks on mediawikiwiki (T332733) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:59 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2013.codfw.wmnet with OS bullseye
- 13:53 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:43 tchanders@deploy2002: Started scap: Backport for Enable partial action blocks on mediawikiwiki (T332733)
- 13:39 tchanders@deploy2002: Finished scap: Backport for Enable partial action blocks on commonswiki (T339878) (duration: 35m 04s)
- 13:37 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:36 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:34 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1015
- 13:34 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1015
- 13:30 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:28 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:27 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:26 tchanders@deploy2002: tchanders: Continuing with sync
- 13:25 tchanders@deploy2002: tchanders: Backport for Enable partial action blocks on commonswiki (T339878) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:25 urbanecm: mwmaint2002: `mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki 'Private Incident Reporting System/Updates' 'Incident Reporting System/Updates' 'Martin Urbanec' --reason 'per request'` (T347019)
- 13:08 fabfur: disabled puppet on cp1090 for T346874
- 13:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2028.codfw.wmnet with OS bullseye
- 13:04 tchanders@deploy2002: Started scap: Backport for Enable partial action blocks on commonswiki (T339878)
- 12:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
- 12:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
- 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
- 12:31 milimetric@deploy2002: Finished deploy [analytics/aqs/deploy@041016f] (aqs): Enable etags on all AQS 1.0 endpoints (duration: 10m 23s)
- 12:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2028.codfw.wmnet with OS bullseye
- 12:22 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
- 12:21 milimetric@deploy2002: Started deploy [analytics/aqs/deploy@041016f] (aqs): Enable etags on all AQS 1.0 endpoints
- 12:20 fabfur: depooled cp1090.eqiad.wmnet to test new purged package version (T346874)
- 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
- 12:03 effie: cordon kubernetes2028 to reimage
- 11:59 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
- 11:57 ladsgroup@deploy2002: Finished scap: Backport for Turn on write both for pagelinks in largest s3 wikis (T345732) (duration: 36m 44s)
- 11:45 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
- 11:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:43 ladsgroup@deploy2002: ladsgroup: Backport for Turn on write both for pagelinks in largest s3 wikis (T345732) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 11:39 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
- 11:33 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
- 11:28 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
- 11:21 ladsgroup@deploy2002: Started scap: Backport for Turn on write both for pagelinks in largest s3 wikis (T345732)
- 11:20 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided) (duration: 01m 05s)
- 11:19 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided)
- 11:08 arturo: merging homer CR firewall patch https://gerrit.wikimedia.org/r/c/operations/homer/public/+/959706 for T346948
- 10:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T343198)', diff saved to https://phabricator.wikimedia.org/P52550 and previous config saved to /var/cache/conftool/dbconfig/20230921-105723-arnaudb.json
- 10:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 10:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 10:54 moritzm: installing c-ares security updates
- 10:49 ladsgroup@deploy2002: Finished scap: Backport for Enable pagelinks write both in testwiki (T345732) (duration: 36m 27s)
- 10:48 moritzm: installing flac security updates
- 10:42 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 10:36 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 10:34 ladsgroup@deploy2002: ladsgroup: Backport for Enable pagelinks write both in testwiki (T345732) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
- 10:27 XioNoX: set max repeaters = 20 on asw2-a-eqiad - T346759
- 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
- 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
- 10:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
- 10:19 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 10:18 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 10:17 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 10:17 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 10:17 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 10:17 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 10:17 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 10:12 ladsgroup@deploy2002: Started scap: Backport for Enable pagelinks write both in testwiki (T345732)
- 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove eqsin-eqdfw tunnel - ayounsi@cumin1001"
- 10:09 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove eqsin-eqdfw tunnel - ayounsi@cumin1001"
- 10:07 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
- 09:55 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:51 effie: disable puppet on kubernetes[2025-2053].codfw.wmnet
- 09:42 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:41 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:40 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 09:40 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 09:38 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 09:38 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 09:36 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 09:36 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 09:35 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
- 09:34 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
- 09:33 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
- 09:32 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
- 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
- 09:32 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
- 09:30 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
- 09:30 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
- 09:28 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
- 09:28 XioNoX: remove GRE tunnel between eqsin and eqdfw - T344888
- 09:27 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
- 09:08 kevinbazira@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti2030.codfw.wmnet with reason: Fixup DRBD
- 09:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti2030.codfw.wmnet with reason: Fixup DRBD
- 09:00 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1007.wikimedia.org
- 09:00 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:00 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1007.wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
- 08:59 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1007.wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
- 08:57 taavi@cumin1001: START - Cookbook sre.dns.netbox
- 08:51 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1007.wikimedia.org
- 08:14 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 08:14 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 08:14 brouberol: redeploying mw-page-content-change-enrich in staging T336041
- 08:13 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
- 08:13 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
- 08:13 brouberol: redeploying eventstreams-internal in staging T336041
- 08:12 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 08:12 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 08:12 brouberol: redeploying eventgate-analytics-external in staging T336041
- 08:10 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
- 08:10 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
- 07:52 kartik@deploy2002: Finished scap: Backport for Enable MinT translation service on Meta-Wiki - rollout #5 (T341445) (duration: 42m 01s)
- 07:38 kartik@deploy2002: kartik and abi: Continuing with sync
- 07:32 kartik@deploy2002: kartik and abi: Backport for Enable MinT translation service on Meta-Wiki - rollout #5 (T341445) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:10 kartik@deploy2002: Started scap: Backport for Enable MinT translation service on Meta-Wiki - rollout #5 (T341445)
- 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2915
- 06:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2915
- 06:31 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 06:31 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix cloudsw cloud-private records - taavi@cumin1001"
- 06:30 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix cloudsw cloud-private records - taavi@cumin1001"
- 06:28 taavi@cumin1001: START - Cookbook sre.dns.netbox
- 05:52 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 05:49 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 05:47 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 05:47 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 05:44 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 05:44 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 05:40 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 05:38 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 05:24 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 05:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 02:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1248']
- 02:17 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1249']
- 02:17 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1246']
- 02:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1245']
- 02:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1242']
- 02:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1249']
- 02:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1248']
- 02:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1246']
- 02:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1244']
- 02:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1243']
- 02:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1241']
- 02:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1240']
- 02:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1245']
- 01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1239']
- 01:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1244']
- 01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1237']
- 01:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1243']
- 01:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1242']
- 01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1238']
- 01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1236']
- 01:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1241']
- 01:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1235']
- 01:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1240']
- 01:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1234']
- 01:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1239']
- 01:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1245.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1235']
- 01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1238']
- 01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1237']
- 01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1236']
- 01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1234']
- 01:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1248.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1246.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1249.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1244.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1246.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1245.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1248.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1243.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1241.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1242.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1240.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1249.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:17 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1235.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:11 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1244.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1234.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1243.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1242.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1241.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1240.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1237.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1239.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1238.mgmt.eqiad.wmnet with reboot policy FORCED
- 01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1236.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1239.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1238.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1237.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1235.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1234.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1236.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:49 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1249
- 00:48 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1248
- 00:47 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1249
- 00:47 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1246
- 00:47 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1247
- 00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1248
- 00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1245
- 00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1247
- 00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1244
- 00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1246
- 00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1243
- 00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1245
- 00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1244
- 00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1242
- 00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1240
- 00:45 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1243
- 00:45 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1241
- 00:44 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1242
- 00:44 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1241
- 00:44 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1240
- 00:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1238
- 00:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1236
- 00:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1239
- 00:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1239
- 00:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1238
- 00:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1236
- 00:42 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1234
- 00:42 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1235
- 00:41 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1235
- 00:41 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1234
- 00:39 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:39 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[34-49] - jclark@cumin1001"
- 00:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[34-49] - jclark@cumin1001"
- 00:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 00:15 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc1016']
- 00:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 00:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pc1015']
- 00:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1015']
- 00:07 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['pc1016']
- 00:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1016']
- 00:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1016']
- 00:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED
2023-09-20
- 23:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1016
- 23:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:49 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1015
- 23:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1015
- 23:49 jclark@cumin1001: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host pc1016
- 23:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1016
- 23:48 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1016
- 23:48 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:48 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pc1016 - jclark@cumin1001"
- 23:47 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pc1016 - jclark@cumin1001"
- 23:44 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 19:26 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@80496b8]: (no justification provided) (duration: 00m 05s)
- 19:26 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@80496b8]: (no justification provided)
- 19:25 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@80496b8]: (no justification provided) (duration: 00m 09s)
- 19:24 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@80496b8]: (no justification provided)
- 18:21 brennen@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.27 refs T345888 (duration: 07m 17s)
- 18:14 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.27 refs T345888
- 18:02 brennen: train 1.41.0-wmf.27 (T345888): no current blockers, logs clean, rolling to group1
- 16:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 16:28 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 16:26 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 16:26 klausman: pushing revert of ORES TTL change
- 16:24 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 15:30 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 15:30 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 15:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 15:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 15:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 15:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 15:09 moritzm: added Taavi and Effie (new key) to pwstore
- 15:08 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 15:08 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 15:06 brouberol@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
- 15:05 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 15:05 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 15:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 15:03 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
- 15:03 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
- 15:02 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 15:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 14:59 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 14:58 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 14:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud-private records - cmooney@cumin1001"
- 14:44 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud-private records - cmooney@cumin1001"
- 14:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 14:35 kamila_: update maintenance.eqiad.wmnet to point to mwmaint2002
- 14:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2044.codfw.wmnet for high load - bking@cumin1001
- 14:26 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2044.codfw.wmnet for high load - bking@cumin1001
- 14:25 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic2044 for high load - bking@cumin1001
- 14:25 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2044 for high load - bking@cumin1001
- 14:16 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
- 14:10 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
- 14:09 kamila@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover: MediaWiki - T346474 (duration: 12m 54s)
- 14:07 kamila_: Phase 9.5 Update DNS records for new database masters - T346474
- 14:06 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
- 14:06 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
- 14:06 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
- 14:04 marostegui: Testing
- 14:03 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
- 14:03 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
- 14:03 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
- 14:03 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
- 14:03 kamila@cumin1001: MediaWiki read-only period ends at: 2023-09-20 14:02:59.798838
- 14:03 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
- 14:02 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
- 14:02 kamila@cumin1001: MediaWiki read-only period ends at: 2023-09-20 14:02:53.790615
- 14:00 kamila@cumin1001: MediaWiki read-only period starts at: 2023-09-20 14:00:32.114116
- 14:00 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
- 13:57 kamila@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
- 13:57 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
- 13:57 kamila@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
- 13:57 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
- 13:56 kamila@deploy2002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover: MediaWiki - T346474
- 13:56 urbanecm@deploy2002: Finished scap: Backport for build: Update eslint-config-wikimedia to 0.25.1 (T346629), Change CSS selector for Minerva mobile menu icon (T346459) (duration: 34m 21s)
- 13:56 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
- 13:50 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
- 13:50 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
- 13:50 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
- 13:49 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
- 13:49 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
- 13:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm-test1001.wikimedia.org with OS bookworm
- 13:43 urbanecm@deploy2002: urbanecm and jdlrobson: Continuing with sync
- 13:42 urbanecm@deploy2002: urbanecm and jdlrobson: Backport for build: Update eslint-config-wikimedia to 0.25.1 (T346629), Change CSS selector for Minerva mobile menu icon (T346459) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
- 13:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
- 13:21 urbanecm@deploy2002: Started scap: Backport for build: Update eslint-config-wikimedia to 0.25.1 (T346629), Change CSS selector for Minerva mobile menu icon (T346459)
- 13:12 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm-test1001.wikimedia.org with OS bookworm
- 13:02 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 00m 27s)
- 13:02 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
- 12:54 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 02m 10s)
- 12:52 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
- 12:52 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 04m 43s)
- 12:47 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
- 12:45 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 04m 34s)
- 12:41 akosiaris: T346354 deploy RESTBase after bug is fixed
- 12:40 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
- 11:56 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 11:56 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 11:49 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 11:49 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 11:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org
- 11:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb1004.wikimedia.org
- 11:20 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 11:20 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 11:17 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) openstack.eqiad1.wikimediacloud.org on all recursors
- 11:17 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache openstack.eqiad1.wikimediacloud.org on all recursors
- 11:14 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:14 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack.eqiad1 - aborrero@cumin1001"
- 11:13 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack.eqiad1 - aborrero@cumin1001"
- 11:11 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
- 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
- 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
- 10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
- 10:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2004.codfw.wmnet with OS bullseye
- 10:04 brouberol@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
- 10:03 klausman: RUnning authdns-update to activate change 957689 (T341696)
- 10:02 klausman: Merging change 957689 (T341696) to lower DNS TTL to 5m for ORES name.
- 10:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
- 10:00 Emperor: ms-be10[61-75] swift package updates T346730
- 09:56 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
- 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1005.eqiad.wmnet with OS bullseye
- 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
- 09:54 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
- 09:48 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on kafka-jumbo1003.eqiad.wmnet with reason: investigation by brouberol and elukey about kafka ACL issues that might be fixed by a broker restart
- 09:48 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on kafka-jumbo1003.eqiad.wmnet with reason: investigation by brouberol and elukey about kafka ACL issues that might be fixed by a broker restart
- 09:41 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner2004.codfw.wmnet with OS bullseye
- 09:39 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
- 09:38 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
- 09:35 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
- 09:34 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
- 09:34 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 09:34 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 09:33 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:32 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 09:29 klausman: Draining ml-serve1008 for kubelet partition increase (T339231)
- 09:24 klausman: Draining ml-serve1007 for kubelet partition increase (T339231)
- 09:15 klausman: Draining ml-serve1006 for kubelet partition increase (T339231)
- 09:13 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
- 09:09 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
- 09:08 fabfur: applied patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/957292 (T344175) to add new mobile redirect domains to Varnish. Changes will be applied automatically by puppet on all cp hosts
- 09:06 klausman: Draining ml-serve1005 for kubelet partition increase (T339231)
- 09:00 godog: restore benthos@webrequest_live running on both centrallog hosts - T346871
- 08:57 klausman: Draining ml-serve1004 for kubelet partition increase (T339231)
- 08:47 klausman: Draining ml-serve1003 for kubelet partition increase (T339231)
- 08:47 godog: temp bump threads to 15 for benthos@webrequest_live on centrallog2002 - T346871
- 08:40 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.eqiad.wmnet with OS bullseye
- 08:40 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices1005.eqiad.wmnet with OS bullseye
- 08:40 klausman: Draining ml-serve1002 for kubelet partition increase (T339231)
- 08:36 godog: stop benthos@webrequest_live.service on centrallog1002 to test redudancy/capacity - T346871
- 08:33 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.eqiad.wmnet with OS bullseye
- 08:32 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:31 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 08:31 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices1005
- 08:31 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1005
- 08:30 aborrero@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudservices1005
- 08:30 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1005
- 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
- 08:20 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
- 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
- 08:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
- 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
- 08:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
- 08:10 moritzm: restarting FPM on mw* to pick up libwebp security updates
- 08:02 moritzm: installing libwebp security updates on buster
- 07:42 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm1001.wikimedia.org with OS bookworm
- 07:41 taavi@deploy2002: Finished scap: Backport for Set READ_NEW for Wikitech on OATHAuth multiple devices migration (T242031), Set WRITE_NEW for OATHAuth multiple devices on fishbowls/privates (T242031) (duration: 36m 09s)
- 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
- 07:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
- 07:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
- 07:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
- 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-web1001.eqiad.wmnet
- 07:28 taavi@deploy2002: taavi: Continuing with sync
- 07:26 taavi@deploy2002: taavi: Backport for Set READ_NEW for Wikitech on OATHAuth multiple devices migration (T242031), Set WRITE_NEW for OATHAuth multiple devices on fishbowls/privates (T242031) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental X
- 07:24 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm1001.wikimedia.org with reason: host reimage
- 07:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-web1001.eqiad.wmnet
- 07:22 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm1001.wikimedia.org with reason: host reimage
- 07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
- 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
- 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
- 07:09 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm1001.wikimedia.org with OS bookworm
- 07:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
- 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
- 07:05 taavi@deploy2002: Started scap: Backport for Set READ_NEW for Wikitech on OATHAuth multiple devices migration (T242031), Set WRITE_NEW for OATHAuth multiple devices on fishbowls/privates (T242031)
- 07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
- 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1009.eqiad.wmnet
- 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1009.eqiad.wmnet
- 06:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
- 06:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
- 06:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
- 06:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
- 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1002.eqiad.wmnet with OS bullseye
- 01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 01:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 00:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 00:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 00:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pki1002.eqiad.wmnet with OS bullseye
- 00:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1232']
- 00:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1233']
- 00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1231']
- 00:02 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1229']
- 00:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1233']
- 00:01 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1230']
- 00:01 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1228']
- 00:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1232']
- 00:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1231']
- 00:00 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1227']
- 00:00 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1226']
2023-09-19
- 23:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1230']
- 23:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1229']
- 23:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1228']
- 23:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1227']
- 23:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1226']
- 23:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:30 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:29 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 23:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1232.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1233.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1231.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1233.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1232.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1231.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:57 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1228.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:57 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1230.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1227.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1226.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:51 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@b603e64]: (no justification provided) (duration: 00m 05s)
- 22:51 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@b603e64]: (no justification provided)
- 22:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:50 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.27 refs T345888
- 21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1227.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1228.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1230.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1226.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:48 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1232
- 21:47 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1232
- 21:46 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:45 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 21:41 brennen: train 1.41.0-wmf.27 (T345888): blockers resolved; rolling to group0
- 21:37 brennen@deploy2002: Finished scap: Backport for Disable client preferences by default (T345363) (duration: 40m 45s)
- 21:37 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1232
- 21:37 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1232
- 21:36 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1233
- 21:36 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1231
- 21:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1233
- 21:34 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1232
- 21:34 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1232
- 21:34 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1231
- 21:33 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:32 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1230
- 21:32 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1229
- 21:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 21:32 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 21:31 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1230
- 21:31 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1226
- 21:31 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1229
- 21:31 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1227
- 21:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1227
- 21:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1226
- 21:29 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:29 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[26-33] - jclark@cumin1001"
- 21:29 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[26-33] - jclark@cumin1001"
- 21:26 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 21:25 brennen@deploy2002: jdlrobson and brennen: Continuing with sync
- 21:20 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1010']
- 21:17 brennen@deploy2002: jdlrobson and brennen: Backport for Disable client preferences by default (T345363) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 21:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
- 21:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010']
- 21:11 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1009']
- 21:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
- 21:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1007']
- 21:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009']
- 21:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007']
- 20:57 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pki1002']
- 20:57 brennen@deploy2002: Started scap: Backport for Disable client preferences by default (T345363)
- 20:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002']
- 20:55 brennen@deploy2002: Finished scap: Backport for Fixes cannot read properties of undefined (T342277) (duration: 37m 39s)
- 20:51 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@b603e64]: (no justification provided) (duration: 00m 05s)
- 20:51 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@b603e64]: (no justification provided)
- 20:50 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@b603e64]: (no justification provided) (duration: 00m 09s)
- 20:50 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@b603e64]: (no justification provided)
- 20:42 brennen@deploy2002: jdlrobson and brennen: Continuing with sync
- 20:38 brennen@deploy2002: jdlrobson and brennen: Backport for Fixes cannot read properties of undefined (T342277) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:37 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:36 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010
- 20:36 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010
- 20:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:35 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007-10 - jclark@cumin1001"
- 20:34 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007-10 - jclark@cumin1001"
- 20:32 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 20:18 brennen@deploy2002: Started scap: Backport for Fixes cannot read properties of undefined (T342277)
- 19:48 brennen@deploy2002: Finished scap: Backport for Revert "ResourceLoader: Set 'virtualFilePath' for startup.js" (T346800) (duration: 40m 46s)
- 19:31 brennen@deploy2002: jforrester and brennen: Continuing with sync
- 19:29 brennen@deploy2002: jforrester and brennen: Backport for Revert "ResourceLoader: Set 'virtualFilePath' for startup.js" (T346800) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 19:24 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
- 19:21 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1015
- 19:20 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1015
- 19:07 brennen@deploy2002: Started scap: Backport for Revert "ResourceLoader: Set 'virtualFilePath' for startup.js" (T346800)
- 16:28 claime: Deployed https://gerrit.wikimedia.org/r/953344 - T345204
- 16:04 kamila_: DC Switchover: traffic - T346330
- 15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 15:58 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 15:58 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 15:57 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
- 15:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
- 15:57 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 15:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 15:57 cgoubert@deploy2002: Finished scap: (no justification provided) (duration: 03m 12s)
- 15:56 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
- 15:56 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/similar-users: apply
- 15:56 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/similar-users: apply
- 15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/similar-users: apply
- 15:55 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
- 15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
- 15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 15:54 akosiaris: scaling down mobileapps, wikifeeds, mathoid, similar-users
- 15:54 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 15:53 cgoubert@deploy2002: Started scap: (no justification provided)
- 15:53 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
- 15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
- 15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
- 15:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
- 15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 15:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 15:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 15:51 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 15:51 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 15:51 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 15:46 cgoubert@deploy2002: Finished scap: (no justification provided) (duration: 40m 44s)
- 15:45 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 15:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 15:28 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:26 claime: running puppet on 'A:cp-text and P{P:trafficserver::backend}' - T346330
- 15:25 claime: reduce mw-on-k8s traffic to 3% waiting on new nodes - T346330
- 15:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:06 cgoubert@deploy2002: Started scap: (no justification provided)
- 15:05 kamila@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330 (duration: 34m 46s)
- 15:02 akosiaris: increase thumbor's pods in codfw to 48 to harmonize with eqiad
- 15:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 15:02 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 14:56 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010
- 14:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010
- 14:51 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009
- 14:51 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
- 14:50 moritzm: installing python-werkzeug security updates
- 14:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009
- 14:49 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
- 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1007
- 14:48 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1007
- 14:46 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1026-56} - jclark@cumin1001"
- 14:45 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1026-56} - jclark@cumin1001"
- 14:42 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 14:36 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-rw,name=codfw
- 14:36 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-rw,name=eqiad
- 14:33 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro
- 14:33 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift
- 14:32 kamila_: Switch deployment server - T346330
- 14:30 kamila@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330
- 14:28 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in eqiad: Datacenter Switchover: Services - T346330
- 14:28 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thumbor
- 14:25 oblivian@deploy1002: Finished scap: (no justification provided) (duration: 05m 44s)
- 14:20 oblivian@deploy1002: Started scap: (no justification provided)
- 14:20 kamila@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330 (duration: 19m 27s)
- 14:01 kamila@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in eqiad: Datacenter Switchover: Services - T346330
- 14:00 kamila@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330
- 13:58 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki shwiki --fix` T346588
- 13:57 samtar@deploy1002: Finished scap: Backport for Add namespace aliases to shwiki (T346588) (duration: 51m 50s)
- 13:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:53 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-test-client1001.eqiad.wmnet
- 13:52 elukey: clean old puppet certs kafka_logging-{eqiad,codfw}_broker from the Puppet CA and from Puppet private - T300130
- 13:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 13:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 13:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating DNS record of kuberbetes2026 - jhancock@cumin2002"
- 13:51 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
- 13:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating DNS record of kuberbetes2026 - jhancock@cumin2002"
- 13:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 13:47 jebe@deploy1002: Finished deploy [airflow-dags/analytics@6b9855a]: (no justification provided) (duration: 00m 43s)
- 13:46 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-test-client1001.eqiad.wmnet
- 13:46 jebe@deploy1002: Started deploy [airflow-dags/analytics@6b9855a]: (no justification provided)
- 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
- 13:33 samtar@deploy1002: samtar and aleksandar: Continuing with sync
- 13:28 samtar@deploy1002: samtar and aleksandar: Backport for Add namespace aliases to shwiki (T346588) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
- 13:17 jebe@deploy1002: Finished deploy [analytics/refinery@2d9d6d0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2d9d6d0] (duration: 02m 06s)
- 13:15 Emperor: ms-be10[44-60] swift package updates T346730
- 13:15 jebe@deploy1002: Started deploy [analytics/refinery@2d9d6d0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2d9d6d0]
- 13:14 jebe@deploy1002: Finished deploy [analytics/refinery@2d9d6d0] (thin): Regular analytics weekly train THIN [analytics/refinery@2d9d6d0] (duration: 00m 04s)
- 13:14 jebe@deploy1002: Started deploy [analytics/refinery@2d9d6d0] (thin): Regular analytics weekly train THIN [analytics/refinery@2d9d6d0]
- 13:14 jebe@deploy1002: Finished deploy [analytics/refinery@2d9d6d0]: Regular analytics weekly train [analytics/refinery@2d9d6d0] (duration: 05m 52s)
- 13:08 jebe@deploy1002: Started deploy [analytics/refinery@2d9d6d0]: Regular analytics weekly train [analytics/refinery@2d9d6d0]
- 13:05 samtar@deploy1002: Started scap: Backport for Add namespace aliases to shwiki (T346588)
- 13:05 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
- 12:44 Emperor: ms-be20[60-73] swift package updates T346730
- 12:22 Emperor: ms-be20[49-59] swift package updates T346730
- 12:19 jebe@deploy1002: Finished deploy [analytics/refinery@91bb4a0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@91bb4a0] (duration: 02m 03s)
- 12:18 Emperor: ms-be2048 swift package updates T346730
- 12:17 jebe@deploy1002: Started deploy [analytics/refinery@91bb4a0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@91bb4a0]
- 12:17 jebe@deploy1002: Finished deploy [analytics/refinery@91bb4a0] (thin): Regular analytics weekly train THIN [analytics/refinery@91bb4a0] (duration: 00m 05s)
- 12:17 jebe@deploy1002: Started deploy [analytics/refinery@91bb4a0] (thin): Regular analytics weekly train THIN [analytics/refinery@91bb4a0]
- 12:14 Emperor: ms-be2047 swift package updates T346730
- 12:12 Emperor: ms-be204{5,6} swift package updates T346730
- 12:10 jebe@deploy1002: Finished deploy [analytics/refinery@91bb4a0]: Regular analytics weekly train [analytics/refinery@91bb4a0] (duration: 06m 53s)
- 12:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 12:03 jebe@deploy1002: Started deploy [analytics/refinery@91bb4a0]: Regular analytics weekly train [analytics/refinery@91bb4a0]
- 11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 11:51 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 11:48 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52530 and previous config saved to /var/cache/conftool/dbconfig/20230919-112156-root.json
- 11:09 Emperor: eqiad swift front-end swift package updates T346730
- 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52529 and previous config saved to /var/cache/conftool/dbconfig/20230919-110651-root.json
- 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52528 and previous config saved to /var/cache/conftool/dbconfig/20230919-105147-root.json
- 10:38 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1148.eqiad.wmnet with OS bullseye
- 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52527 and previous config saved to /var/cache/conftool/dbconfig/20230919-103642-root.json
- 10:34 Emperor: codfw swift front-end swift package updates T346730
- 10:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1147.eqiad.wmnet with OS bullseye
- 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52526 and previous config saved to /var/cache/conftool/dbconfig/20230919-102137-root.json
- 10:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
- 10:11 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
- 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52525 and previous config saved to /var/cache/conftool/dbconfig/20230919-100632-root.json
- 10:01 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
- 09:58 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1148.eqiad.wmnet with OS bullseye
- 09:56 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
- 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 3%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52524 and previous config saved to /var/cache/conftool/dbconfig/20230919-095127-root.json
- 09:48 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm2001.wikimedia.org with OS bookworm
- 09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1147.eqiad.wmnet with OS bullseye
- 09:40 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
- 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52523 and previous config saved to /var/cache/conftool/dbconfig/20230919-093622-root.json
- 09:12 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
- 09:08 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
- 09:03 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
- 08:59 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
- 08:47 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
- 08:44 godog: bounce benthos@webrequest_live to clear out old metrics
- 08:43 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
- 08:41 godog: remove MediaWiki.*.growthexperiments.taskcount.link_recommendation.* from graphite - T346371
- 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
- 08:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1146.eqiad.wmnet with OS bullseye
- 08:34 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
- 08:30 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
- 08:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm2001.wikimedia.org with reason: host reimage
- 08:26 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 08:26 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 08:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm2001.wikimedia.org with reason: host reimage
- 08:26 brouberol: redeploying mw-page-content-change-enrich in codfw T336041
- 08:26 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 08:25 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 08:25 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
- 08:25 brouberol: redeploying mw-page-content-change-enrich in eqiad T336041
- 08:24 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
- 08:24 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
- 08:24 brouberol: redeploying eventstreams-internal in eqiad T336041
- 08:23 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
- 08:23 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
- 08:23 brouberol: redeploying eventstreams-internal in codfw T336041
- 08:22 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 08:21 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 08:21 brouberol: redeploying eventstream-analytics-external in codfw T336041
- 08:21 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 08:20 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 08:20 brouberol: redeploying eventstream-analytics-external in eqiad T336041
- 08:19 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
- 08:18 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
- 08:18 brouberol: redeploying eventstream-analytics in codfw T336041
- 08:18 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
- 08:17 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
- 08:13 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
- 08:11 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm2001.wikimedia.org with OS bookworm
- 08:10 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
- 08:05 brouberol@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
- 08:05 brouberol@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
- 08:05 moritzm: restarting FPM on mw canaries to pick up libwebp updates
- 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
- 08:02 brouberol@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 08:02 brouberol@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 08:00 brouberol@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
- 07:59 brouberol@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
- 07:56 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS bullseye
- 07:51 moritzm: installing libwebp security updates on buster
- 07:51 moritzm: installing libwep security updates on buster
- 07:43 kartik@deploy1002: Finished scap: Backport for Disable Special:Contribute on bnwiki (T345772) (duration: 38m 49s)
- 07:27 kartik@deploy1002: kartik: Continuing with sync
- 07:26 kartik@deploy1002: kartik: Backport for Disable Special:Contribute on bnwiki (T345772) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:11 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 07:04 kartik@deploy1002: Started scap: Backport for Disable Special:Contribute on bnwiki (T345772)
- 06:35 denisse: updating PCC facts
- 06:09 XioNoX: push new pfw policy - T346705
- 05:48 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2004-dev.codfw.wmnet with OS bookworm
- 05:46 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
- 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P52522 and previous config saved to /var/cache/conftool/dbconfig/20230919-054539-root.json
- 04:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 04:06 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.25 (duration: 02m 10s)
- 04:03 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.27 refs T345888 (duration: 61m 05s)
- 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.27 refs T345888
- 00:56 eileen: civicrm upgraded from 0a36997d to f0e9d3f6
2023-09-18
- 22:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs1004.eqiad.wmnet
- 22:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
- 22:07 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
- 21:59 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
- 21:51 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs1004.eqiad.wmnet
- 21:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs1003.eqiad.wmnet
- 21:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
- 21:45 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
- 21:40 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
- 21:19 maryum: Deployed patch for T344359
- 21:13 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs1003.eqiad.wmnet
- 20:49 cjming: end of UTC late backport window
- 20:36 cjming@deploy1002: Finished scap: Backport for Link recommendations: prevent too large offsets in cirrus queries (T345713) (duration: 11m 40s)
- 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1008.eqiad.wmnet with OS bullseye
- 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:29 cjming@deploy1002: urbanecm and cjming: Continuing with sync
- 20:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1009.eqiad.wmnet with OS bullseye
- 20:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:26 cjming@deploy1002: urbanecm and cjming: Backport for Link recommendations: prevent too large offsets in cirrus queries (T345713) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:24 cjming@deploy1002: Started scap: Backport for Link recommendations: prevent too large offsets in cirrus queries (T345713)
- 20:24 cjming@deploy1002: Finished scap: Backport for clienthints: Enable purging of data on all wikis (T257893) (duration: 09m 24s)
- 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:16 cjming@deploy1002: cjming and dreamyjazz: Continuing with sync
- 20:16 cjming@deploy1002: cjming and dreamyjazz: Backport for clienthints: Enable purging of data on all wikis (T257893) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1008.eqiad.wmnet with reason: host reimage
- 20:15 cjming@deploy1002: Started scap: Backport for clienthints: Enable purging of data on all wikis (T257893)
- 20:13 cjming@deploy1002: Finished scap: Backport for clienthints: Pin wgCheckUserDisplayClientHints to false (T337942) (duration: 08m 18s)
- 20:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1008.eqiad.wmnet with reason: host reimage
- 20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: host reimage
- 20:06 cjming@deploy1002: cjming and dreamyjazz: Continuing with sync
- 20:06 cjming@deploy1002: cjming and dreamyjazz: Backport for clienthints: Pin wgCheckUserDisplayClientHints to false (T337942) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1009.eqiad.wmnet with reason: host reimage
- 20:05 cjming@deploy1002: Started scap: Backport for clienthints: Pin wgCheckUserDisplayClientHints to false (T337942)
- 19:46 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 19:43 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 19:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host dbstore1009.eqiad.wmnet with OS bullseye
- 19:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host dbstore1008.eqiad.wmnet with OS bullseye
- 18:02 ejegg: re-enabled donor thank you mail send jobs
- 17:50 ejegg: civicrm upgraded from 0c2853aa to 0a36997d
- 17:48 ejegg: disabled donor thank you mail send jobs for Civi update
- 16:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1145.eqiad.wmnet with OS bullseye
- 16:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbstore1009']
- 16:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbstore1008']
- 16:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1144.eqiad.wmnet with OS bullseye
- 16:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbstore1009']
- 16:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbstore1009']
- 16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbstore1009']
- 16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbstore1008']
- 16:17 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
- 16:15 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
- 16:14 jnuche@deploy1002: Installation of scap version "4.61.1" completed for 601 hosts
- 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1036.eqiad.wmnet with OS bullseye
- 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:12 jnuche@deploy1002: Installing scap version "4.61.1" for 601 hosts
- 16:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:03 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS bullseye
- 16:01 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
- 15:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1047.eqiad.wmnet with OS bullseye
- 15:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:57 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
- 15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
- 15:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
- 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1038.eqiad.wmnet with OS bullseye
- 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:53 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 31s)
- 15:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
- 15:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:44 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 45s)
- 15:43 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS bullseye
- 15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
- 15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
- 15:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
- 15:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
- 15:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1036
- 15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
- 15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
- 15:28 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1036
- 15:27 Emperor: install new swift packages on ms-be2044
- 15:26 Emperor: repool ms-fe2009 with new swift packages
- 15:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1143.eqiad.wmnet with OS bullseye
- 15:18 Emperor: depool ms-fe2009 to install new swift packages
- 15:13 Emperor: upload swift_2.26.0-10+deb11u1+wmf1_amd64.changes to apt1001
- 15:11 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1142.eqiad.wmnet with OS bullseye
- 15:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
- 15:01 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
- 14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
- 14:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
- 14:47 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS bullseye
- 14:45 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1142.eqiad.wmnet with reason: host reimage
- 14:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
- 14:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
- 14:42 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1142.eqiad.wmnet with reason: host reimage
- 14:41 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
- 14:38 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
- 14:32 jelto: use certmanager instead of certgen in miscweb namespace - T300033
- 14:29 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 14:29 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS bullseye
- 14:26 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 14:24 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 14:21 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 14:20 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices2004-dev.codfw.wmnet with OS bookworm
- 14:18 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm
- 14:15 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 14:05 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
- 14:04 bblack: lvs1020, lvs1018: restarting pybal to re-enable healthchecks for wikireplicas ( T337446 -> https://gerrit.wikimedia.org/r/924508 )
- 14:01 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
- 14:01 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
- 14:01 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
- 13:57 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
- 13:56 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
- 13:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
- 13:47 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
- 13:46 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm
- 13:38 godog: force-set max-repeaters to 20 for cr2-eqsin and cr3-eqsin - T346606
- 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
- 13:24 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
- 13:16 taavi@deploy1002: Finished scap: Backport for Disable UploadWizard CTA for MachineVision (T345187) (duration: 11m 16s)
- 13:11 vgutierrez: depool cp4052 for bookworm testing - T342154
- 13:09 taavi@deploy1002: taavi and cparle: Continuing with sync
- 13:06 taavi@deploy1002: taavi and cparle: Backport for Disable UploadWizard CTA for MachineVision (T345187) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:04 taavi@deploy1002: Started scap: Backport for Disable UploadWizard CTA for MachineVision (T345187)
- 13:04 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 13:04 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 13:04 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 13:03 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 13:02 godog: set max-repeaters to 30 for cr3-eqsin in librenms - T346606
- 13:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 13:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 12:48 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
- 12:47 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1141.eqiad.wmnet with OS bullseye
- 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
- 12:32 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
- 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: furud.codfw.wmnet
- 12:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: furud.codfw.wmnet
- 12:24 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1141.eqiad.wmnet with reason: host reimage
- 12:24 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1140.eqiad.wmnet with OS bullseye
- 12:23 moritzm: installing libwebp security updates on bullseye
- 12:21 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1141.eqiad.wmnet with reason: host reimage
- 12:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
- 12:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
- 12:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1140.eqiad.wmnet with reason: host reimage
- 12:08 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1141.eqiad.wmnet with OS bullseye
- 12:07 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1140.eqiad.wmnet with reason: host reimage
- 12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
- 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on A:maps-replica-eqiad
- 11:53 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1140.eqiad.wmnet with OS bullseye
- 11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudservices1005.wikimedia.org
- 11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1005 - aborrero@cumin1001"
- 11:45 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1005 - aborrero@cumin1001"
- 11:44 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 11:44 jayme: removed cergen certs from the list of trusted service account token signers on all kubernetes clusters - T329826
- 11:43 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 11:37 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1005.wikimedia.org
- 11:14 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on A:maps-replica-eqiad
- 11:13 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on P{maps201[0].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
- 11:05 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on P{maps201[0].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
- 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on P{maps200[7,8].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
- 10:48 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
- 10:46 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on P{maps200[7,8].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
- 10:44 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
- 10:44 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
- 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on P{maps200[5,6].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
- 10:40 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
- 10:33 godog: set max-repeaters to 20 for cr3-eqsin using "force save" - T346606
- 10:28 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on P{maps200[5,6].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
- 09:59 elukey: remove ores-cache stream from changeprop (side effects - higher ORES client latencies, no mediawiki.revision-score event stream published) - https://phabricator.wikimedia.org/T342116
- 09:56 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
- 09:54 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
- 09:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
- 09:50 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
- 09:50 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
- 09:50 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
- 09:50 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
- 09:50 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
- 09:49 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
- 09:49 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
- 09:49 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
- 09:46 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 09:46 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 09:44 fabfur: enabled puppet on cp4050 for T346602
- 09:43 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
- 09:42 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 09:40 fabfur: disabled puppet on cp4050 for T346602
- 09:39 fabfur: enabled puppet on cp4052 for T346602
- 09:38 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
- 09:34 hashar@deploy1002: Finished scap: Backport for tests: Do not assume UTSysop exists (T346253) (duration: 09m 06s)
- 09:32 fabfur: disabled puppet on cp4052 for T346602
- 09:28 godog: set max-repeaters to 20 for cr3-eqsin in librenms - T346606
- 09:28 godog: set max-repeaters for cr3-eqsin in librenms - T346606
- 09:27 hashar@deploy1002: hashar and urbanecm: Continuing with sync
- 09:26 hashar@deploy1002: hashar and urbanecm: Backport for tests: Do not assume UTSysop exists (T346253) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 09:25 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 09:25 hashar@deploy1002: Started scap: Backport for tests: Do not assume UTSysop exists (T346253)
- 09:25 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
- 09:06 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 09:05 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 09:03 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 09:02 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 09:02 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 08:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 08:47 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 08:46 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 07:58 Amir1: running db checksum run in s3 eqiad replicas (T207253)
- 07:26 taavi@deploy1002: Finished scap: Backport for robots.txt: Disable indexing user (talk) pages and draft (talk) pages on shwiki (T346589) (duration: 22m 24s)
- 07:17 taavi@deploy1002: aleksandar and taavi: Continuing with sync
- 07:15 moritzm: installing clamav security updates
- 07:13 taavi@deploy1002: aleksandar and taavi: Backport for robots.txt: Disable indexing user (talk) pages and draft (talk) pages on shwiki (T346589) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:03 taavi@deploy1002: Started scap: Backport for robots.txt: Disable indexing user (talk) pages and draft (talk) pages on shwiki (T346589)
2023-09-16
- 13:52 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 13:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 13:52 akosiaris: re-enable changeprop
- 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 12:57 akosiaris: stop changeprop in eqiad
- 01:44 krinkle@deploy1002: Finished deploy [integration/docroot@9a1fb37]: (no justification provided) (duration: 00m 06s)
- 01:44 krinkle@deploy1002: Started deploy [integration/docroot@9a1fb37]: (no justification provided)
2023-09-15
- 21:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1047.eqiad.wmnet with OS bullseye
- 21:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
- 20:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
- 20:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
- 20:59 tzatziki: removing 6 files for legal compliance
- 20:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
- 20:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
- 20:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
- 20:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
- 17:56 urandom: stopping Cassandra bootstrap, restbase1030-a — T331713
- 17:43 urandom: initiate Cassandra bootstrap, restbase1030-a — T331713
- 17:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2002-dev.codfw.wmnet with OS bookworm
- 17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bookworm
- 16:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 16:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 16:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 16:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 16:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
- 16:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
- 16:32 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
- 16:32 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
- 16:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2003-dev.codfw.wmnet with OS bookworm
- 16:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bookworm
- 16:09 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 15:51 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:51 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 15:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 15:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 15:50 claime: raising mw-api-int replicas to 12+2 to cope with wdqs backfill
- 15:43 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 15:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 15:42 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 15:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:42 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
- 15:42 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 15:41 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
- 15:39 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
- 15:32 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
- 15:30 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 15:24 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[18,25-27,33].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 15:03 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:03 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:57 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:38 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:38 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:35 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[18,25-27,33].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 14:35 urandom: rolling Cassandra restart, RESTBase/eqiad/row-D — T331713
- 14:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
- 14:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
- 14:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
- 14:27 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2006-dev
- 14:27 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2006-dev
- 14:26 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2005-dev
- 14:26 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2005-dev
- 14:25 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2004-dev
- 14:24 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2004-dev
- 14:06 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2444.codfw.wmnet
- 14:05 claime: repooling mw2444.codfw.wmnet - T345884
- 13:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
- 13:47 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
- 13:46 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
- 13:41 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
- 13:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm-test1001.wikimedia.org with OS bookworm
- 13:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
- 13:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
- 13:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
- 13:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
- 13:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
- 13:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
- 13:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
- 13:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
- 13:19 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
- 13:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
- 13:16 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
- 13:03 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm-test1001.wikimedia.org with OS bookworm
- 13:01 akosiaris@deploy1002: Synchronized docroot: (no justification provided) (duration: 08m 20s)
- 12:50 topranks: changing ECMP hasing algorithm on drmrs, esams and cloud switches T339852
- 12:27 topranks: changing ECMP hasing algorithm on asw1-b12-drmrs T339852
- 11:54 _joe_: updated etcd-mirror to 0.0.10 everywhere
- 11:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1138.eqiad.wmnet with OS bullseye
- 11:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1138.eqiad.wmnet with reason: host reimage
- 11:09 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1138.eqiad.wmnet with reason: host reimage
- 10:56 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1138.eqiad.wmnet with OS bullseye
- 10:07 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:07 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts in codfw - aborrero@cumin1001"
- 09:22 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts in codfw - aborrero@cumin1001"
- 09:20 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
- 09:10 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
- 08:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica2008.wikimedia.org
- 08:57 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ldap-replica2008.wikimedia.org with OS bookworm
- 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica2008.wikimedia.org with reason: host reimage
- 08:47 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 08:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica2008.wikimedia.org with reason: host reimage
- 08:46 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 08:40 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 08:39 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 08:39 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 08:39 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica2008.wikimedia.org with OS bookworm
- 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
- 08:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
- 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica2008.wikimedia.org on all recursors
- 08:26 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica2008.wikimedia.org on all recursors
- 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
- 08:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
- 08:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica2008.wikimedia.org
- 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2007.wikimedia.org
- 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-replica2007.wikimedia.org with OS bookworm
- 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica2007.wikimedia.org with reason: host reimage
- 07:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica2007.wikimedia.org with reason: host reimage
- 07:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica2007.wikimedia.org with OS bookworm
- 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
- 07:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
- 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica2007.wikimedia.org on all recursors
- 07:25 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica2007.wikimedia.org on all recursors
- 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
- 07:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
- 07:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 07:22 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica2007.wikimedia.org
- 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
- 07:21 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
- 07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
- 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
- 07:04 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 07:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 06:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
- 06:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
- 06:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
- 06:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
- 05:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
- 05:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5004.wikimedia.org
- 05:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5004.wikimedia.org
- 02:43 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[17,22-24,29,32].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 01:44 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17,22-24,29,32].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 01:44 urandom: rolling Cassandra restart, RESTBase/eqiad/row-B — T331713
- 01:20 krinkle@deploy1002: Finished scap: Backport for Remove old origin-with-crossorigin referrer policy (T338183) (duration: 08m 16s)
- 01:14 krinkle@deploy1002: krinkle and hartman: Continuing with sync
- 01:13 krinkle@deploy1002: krinkle and hartman: Backport for Remove old origin-with-crossorigin referrer policy (T338183) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 01:12 krinkle@deploy1002: Started scap: Backport for Remove old origin-with-crossorigin referrer policy (T338183)
- 01:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
- 01:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
- 01:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
- 00:12 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[16,19-21,28,31].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
2023-09-14
- 23:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1056.eqiad.wmnet with OS bullseye
- 23:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
- 23:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
- 23:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
- 23:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
- 23:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
- 23:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
- 23:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
- 23:24 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
- 23:19 eileen: civicrm upgraded from 9d34ed9b to 0c2853aa - big vendor update - roll back if issues
- 23:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 23:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 23:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 23:13 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[16,19-21,28,31].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 23:12 urandom: rolling Cassandra restart, RESTBase/eqiad/row-A — T331713
- 23:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
- 23:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1056.eqiad.wmnet with OS bullseye
- 23:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
- 23:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
- 23:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1031.eqiad.wmnet with OS bullseye
- 23:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 22:47 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1030.eqiad.wmnet with OS bullseye
- 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1034.eqiad.wmnet with OS bullseye
- 22:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:44 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2001-dev.codfw.wmnet with OS bookworm
- 22:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
- 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
- 22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
- 22:21 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[12,17-18,23,26-27].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
- 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
- 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
- 22:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2001-dev.codfw.wmnet with reason: host reimage
- 22:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2001-dev.codfw.wmnet with reason: host reimage
- 22:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
- 22:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1031.eqiad.wmnet with OS bullseye
- 22:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1030.eqiad.wmnet with OS bullseye
- 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1032.eqiad.wmnet with OS bullseye
- 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bookworm
- 21:50 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
- 21:42 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1035.eqiad.wmnet with OS bullseye
- 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 21:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1039.eqiad.wmnet with OS bullseye
- 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1037.eqiad.wmnet with OS bullseye
- 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
- 21:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1034.eqiad.wmnet with OS bullseye
- 21:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
- 21:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
- 21:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1034.eqiad.wmnet with OS bullseye
- 21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
- 21:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
- 21:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1034.eqiad.wmnet with OS bullseye
- 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1033.eqiad.wmnet with OS bullseye
- 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:27 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[12,17-18,23,26-27].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 21:26 urandom: rolling Cassandra restart, RESTBase/row-D — T331713
- 21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
- 21:24 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 21:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
- 21:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
- 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
- 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
- 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
- 21:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
- 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
- 21:13 ryankemper: T345475 Beginning process to bring 3 new hosts `wdqs202[3-5]` into service. Merged https://gerrit.wikimedia.org/r/957802 and running puppet on hosts
- 21:06 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 21:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1039.eqiad.wmnet with OS bullseye
- 21:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
- 21:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1037.eqiad.wmnet with OS bullseye
- 21:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
- 21:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1035.eqiad.wmnet with OS bullseye
- 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
- 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1033.eqiad.wmnet with OS bullseye
- 20:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1032.eqiad.wmnet with OS bullseye
- 20:47 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 20:45 thcipriani@deploy1002: Finished scap: Backport for Don't offer visual diffs for non-wikitext pages (T346252), ThreadItemStore: Add details to row insertion exceptions (T343859) (duration: 12m 35s)
- 20:38 thcipriani@deploy1002: thcipriani and matmarex: Continuing with sync
- 20:34 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 20:34 thcipriani@deploy1002: thcipriani and matmarex: Backport for Don't offer visual diffs for non-wikitext pages (T346252), ThreadItemStore: Add details to row insertion exceptions (T343859) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD
- 20:32 thcipriani@deploy1002: Started scap: Backport for Don't offer visual diffs for non-wikitext pages (T346252), ThreadItemStore: Add details to row insertion exceptions (T343859)
- 20:20 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 20:20 urandom: rolling Cassandra restart, RESTBase/row-C — T331713
- 20:05 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[13-14,19,21,24].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 19:20 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13-14,19,21,24].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
- 19:20 urandom: rolling Cassandra restart, RESTBase/row-B — T331713
- 19:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1050.eqiad.wmnet with OS bullseye
- 19:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
- 19:10 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1051.eqiad.wmnet with OS bullseye
- 18:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
- 18:58 urandom: initiating `removenode`, ID=627fe8e9-d298-43b3-a1a2-7c8a3f01370b (restbase1030-c) — T331713
- 18:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
- 18:54 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
- 18:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
- 18:52 urandom: stopping bootstrap of restbase1030-c — T331713
- 18:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
- 18:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
- 18:45 urandom: retrying Cassandra bootstrap of restbase1030-c — T331713
- 18:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1050.eqiad.wmnet with OS bullseye
- 18:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1050.eqiad.wmnet with OS bullseye
- 18:38 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 18:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 18:35 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
- 18:35 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1051.eqiad.wmnet with OS bullseye
- 18:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1051.eqiad.wmnet with OS bullseye
- 18:34 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 18:27 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@7160e27]: Deploy latest DAGs to analytics Airflow instance T340861 (duration: 00m 40s)
- 18:27 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@7160e27]: Deploy latest DAGs to analytics Airflow instance T340861
- 18:24 bblack: cp107[56],cp202[78],cp600[19]: (one host from each cluster, at 3 sites): restarting varnish-frontend spaced out over the next ~hour for memory tweaks.
- 18:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1053.eqiad.wmnet with OS bullseye
- 18:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1046.eqiad.wmnet with OS bullseye
- 18:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
- 18:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
- 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1045.eqiad.wmnet with OS bullseye
- 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
- 17:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1053.eqiad.wmnet with OS bullseye
- 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
- 17:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1046.eqiad.wmnet with OS bullseye
- 17:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1048.eqiad.wmnet with OS bullseye
- 17:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
- 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1049.eqiad.wmnet with OS bullseye
- 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
- 17:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
- 17:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1044.eqiad.wmnet with OS bullseye
- 17:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1043.eqiad.wmnet with OS bullseye
- 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1046.eqiad.wmnet with OS bullseye
- 17:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
- 17:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
- 17:20 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
- 17:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
- 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
- 17:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
- 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
- 17:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
- 17:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
- 17:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
- 17:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on search-loader2002.codfw.wmnet,search-loader1002.eqiad.wmnet with reason: T346039
- 17:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on search-loader2002.codfw.wmnet,search-loader1002.eqiad.wmnet with reason: T346039
- 17:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1049.eqiad.wmnet with OS bullseye
- 17:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1048.eqiad.wmnet with OS bullseye
- 17:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
- 17:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
- 17:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
- 17:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1046.eqiad.wmnet with OS bullseye
- 17:00 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1045.eqiad.wmnet with OS bullseye
- 16:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1044.eqiad.wmnet with OS bullseye
- 16:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1043.eqiad.wmnet with OS bullseye
- 16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1040.eqiad.wmnet with OS bullseye
- 16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1041.eqiad.wmnet with OS bullseye
- 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1042.eqiad.wmnet with OS bullseye
- 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:47 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
- 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
- 16:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
- 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
- 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1040.eqiad.wmnet with OS bullseye
- 16:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
- 16:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
- 16:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1040.eqiad.wmnet with OS bullseye
- 16:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1056.eqiad.wmnet with OS bullseye
- 16:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:21 denisse: Failing over from netmon2002 (codfw) to netmon1003 (eqiad).
- 16:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:17 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update - volans@cumin1001"
- 16:17 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update - volans@cumin1001"
- 16:16 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
- 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1042.eqiad.wmnet with OS bullseye
- 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1041.eqiad.wmnet with OS bullseye
- 16:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1040.eqiad.wmnet with OS bullseye
- 16:13 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
- 16:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 16:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1053.eqiad.wmnet with OS bullseye
- 16:12 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
- 16:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
- 16:04 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
- 16:04 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "failed in reimage script said manually run it - robh@cumin1001 - T342533"
- 16:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1030.eqiad.wmnet with OS bullseye
- 16:03 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "failed in reimage script said manually run it - robh@cumin1001 - T342533"
- 16:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1031.eqiad.wmnet with OS bullseye
- 16:03 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
- 16:03 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
- 16:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:01 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
- 16:00 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 15:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1056.eqiad.wmnet with OS bullseye
- 15:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1055.eqiad.wmnet with OS bullseye
- 15:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 15:55 urbanecm@deploy1002: Finished scap: Backport for listTaskCounts: Push total task counts to statsd for all tasks (T345204), linkTaskCounts: Stop producing per-topic statsd data (T345210) (duration: 07m 37s)
- 15:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1054.eqiad.wmnet with OS bullseye
- 15:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 15:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1056.eqiad.wmnet with OS bullseye
- 15:54 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 15:54 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 15:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 15:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 15:53 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1139.eqiad.wmnet with OS bullseye
- 15:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2006.codfw.wmnet
- 15:52 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2006.codfw.wmnet
- 15:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
- 15:52 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
- 15:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2005.codfw.wmnet
- 15:51 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2005.codfw.wmnet
- 15:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bullseye
- 15:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
- 15:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1051.eqiad.wmnet with OS bullseye
- 15:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
- 15:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1050.eqiad.wmnet with OS bullseye
- 15:48 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host search-loader1002.eqiad.wmnet
- 15:47 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host search-loader1002.eqiad.wmnet with OS bullseye
- 15:47 urbanecm@deploy1002: Started scap: Backport for listTaskCounts: Push total task counts to statsd for all tasks (T345204), linkTaskCounts: Stop producing per-topic statsd data (T345210)
- 15:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
- 15:44 jayme: restarting primary lvs in codfw, eqsin, ulsfo
- 15:42 jayme: restarting secondary lvs in codfw, eqsin, ulsfo
- 15:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1054.eqiad.wmnet with reason: host reimage
- 15:37 jayme: running puppet on lvs[2011-2014].codfw.wmnet,lvs[5004-5006].eqsin.wmnet,lvs[4008-4010].ulsfo.wmnet
- 15:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1055.eqiad.wmnet with reason: host reimage
- 15:36 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host search-loader2002.codfw.wmnet
- 15:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host search-loader2002.codfw.wmnet with OS bullseye
- 15:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
- 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1002.eqiad.wmnet
- 15:01 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 15:01 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM search-loader1002.eqiad.wmnet - bking@cumin1001"
- 15:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 14:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf2005.codfw.wmnet with OS bullseye
- 14:58 bking@cumin1001: START - Cookbook sre.dns.netbox
- 14:58 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) search-loader1002.eqiad.wmnet on all recursors
- 14:58 bking@cumin1001: START - Cookbook sre.dns.wipe-cache search-loader1002.eqiad.wmnet on all recursors
- 14:58 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:58 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader1002.eqiad.wmnet - bking@cumin1001"
- 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2002.codfw.wmnet
- 14:55 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader1002.eqiad.wmnet - bking@cumin1001"
- 14:55 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
- 14:55 bking@cumin1001: START - Cookbook sre.hosts.reimage for host search-loader2002.codfw.wmnet with OS bullseye
- 14:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2002.codfw.wmnet
- 14:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet
- 14:52 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM search-loader2002.codfw.wmnet - bking@cumin1001"
- 14:52 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM search-loader2002.codfw.wmnet - bking@cumin1001"
- 14:51 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) search-loader2002.codfw.wmnet on all recursors
- 14:51 bking@cumin1001: START - Cookbook sre.dns.wipe-cache search-loader2002.codfw.wmnet on all recursors
- 14:51 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:51 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader2002.codfw.wmnet - bking@cumin1001"
- 14:51 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.wikimedia.org with reason: test before full decom
- 14:51 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.wikimedia.org with reason: test before full decom
- 14:50 bking@cumin1001: START - Cookbook sre.dns.netbox
- 14:50 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host search-loader1002.eqiad.wmnet
- 14:50 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader2002.codfw.wmnet - bking@cumin1001"
- 14:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet
- 14:47 bking@cumin1001: START - Cookbook sre.dns.netbox
- 14:47 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host search-loader2002.codfw.wmnet
- 14:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1029.eqiad.wmnet with reason: host reimage
- 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2004.codfw.wmnet
- 14:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1028.eqiad.wmnet with reason: host reimage
- 14:43 vgutierrez: varnish: decrease max_connections to 10k per backend server globally
- 14:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2004.codfw.wmnet
- 14:41 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1029.eqiad.wmnet with reason: host reimage
- 14:41 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1028.eqiad.wmnet with reason: host reimage
- 14:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1027.eqiad.wmnet with reason: host reimage
- 14:37 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1027.eqiad.wmnet with reason: host reimage
- 14:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf2005.codfw.wmnet with reason: host reimage
- 14:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2005.codfw.wmnet with reason: host reimage
- 14:32 moritzm: installing qemu security updates on ganeti-test cluster
- 14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1031.eqiad.wmnet with OS bullseye
- 14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1030.eqiad.wmnet with OS bullseye
- 14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1029.eqiad.wmnet with OS bullseye
- 14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1028.eqiad.wmnet with OS bullseye
- 14:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1027.eqiad.wmnet with OS bullseye
- 14:19 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
- 14:18 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
- 14:18 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
- 14:18 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host conf2005.codfw.wmnet with OS bullseye
- 14:17 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
- 14:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
- 14:16 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
- 13:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf2006.codfw.wmnet with OS bullseye
- 13:57 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1138.eqiad.wmnet with OS bullseye
- 13:56 filippo@deploy1002: Finished deploy [librenms/librenms@f049593]: (no justification provided) (duration: 00m 11s)
- 13:55 filippo@deploy1002: Started deploy [librenms/librenms@f049593]: (no justification provided)
- 13:39 godog: issue test alertmanager librenms alert - T346318
- 13:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf2006.codfw.wmnet with reason: host reimage
- 13:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2006.codfw.wmnet with reason: host reimage
- 13:32 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
- 13:31 moritzm: installing libwebp security updates on bookworm
- 13:28 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1139.eqiad.wmnet with reason: host reimage
- 13:28 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
- 13:25 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1139.eqiad.wmnet with reason: host reimage
- 13:19 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host conf2006.codfw.wmnet with OS bullseye
- 13:14 moritzm: installing aom security updates
- 13:13 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
- 13:13 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
- 13:12 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1139.eqiad.wmnet with OS bullseye
- 13:10 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1138.eqiad.wmnet with OS bullseye
- 12:56 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm-test1001.wikimedia.org with OS bookworm
- 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
- 12:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync
- 12:16 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: sync
- 12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
- 12:11 hnowlan@cumin1001: END (FAIL) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=1) rolling restart_daemons on A:restbase-canary
- 12:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
- 12:06 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
- 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
- 12:03 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf2004.codfw.wmnet with OS bullseye
- 12:01 hnowlan@cumin1001: START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-canary
- 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
- 11:54 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm-test1001.wikimedia.org with OS bookworm
- 11:49 hnowlan@deploy1002: Finished deploy [restbase/deploy@8eb62f2]: Revert "Disable wikifeeds announcements healthcheck" (duration: 06m 12s)
- 11:43 hnowlan@deploy1002: Started deploy [restbase/deploy@8eb62f2]: Revert "Disable wikifeeds announcements healthcheck"
- 11:37 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
- 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
- 11:35 hnowlan@deploy1002: Finished deploy [restbase/deploy@e8a6ae4]: Disable wikifeeds announcements healthcheck (duration: 10m 08s)
- 11:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on idm-test1001.wikimedia.org with reason: upgrade to Bookwork
- 11:34 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on idm-test1001.wikimedia.org with reason: upgrade to Bookwork
- 11:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
- 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3003.esams.wmnet
- 11:25 hnowlan@deploy1002: Started deploy [restbase/deploy@e8a6ae4]: Disable wikifeeds announcements healthcheck
- 11:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3003.esams.wmnet
- 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
- 11:21 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
- 11:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
- 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
- 11:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
- 11:12 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
- 11:08 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1137.eqiad.wmnet with OS bullseye
- 11:04 brouberol: brouberol@cumin2002 START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch - T344798
- 11:02 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
- 10:43 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1137.eqiad.wmnet with reason: host reimage
- 10:41 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1137.eqiad.wmnet with reason: host reimage
- 10:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf2004.codfw.wmnet with reason: host reimage
- 10:27 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1137.eqiad.wmnet with OS bullseye
- 10:25 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2004.codfw.wmnet with reason: host reimage
- 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
- 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
- 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-eqiad
- 10:10 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host conf2004.codfw.wmnet with OS bullseye
- 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1006.wikimedia.org
- 10:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
- 10:06 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: sync
- 10:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1006.wikimedia.org
- 10:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1005.wikimedia.org
- 09:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1005.wikimedia.org
- 09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
- 09:52 elukey: remove the 'mediawiki.revision-score' stream form eventstreams public API - T342116
- 09:51 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync
- 09:51 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: sync
- 09:50 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: sync
- 09:49 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: sync
- 09:49 jayme: restarted navtiming on webperf2003 to pick up changed etcd service records
- 09:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
- 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
- 09:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
- 09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
- 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
- 09:22 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
- 09:17 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 09:16 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
- 09:10 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
- 09:07 moritzm: installing qemu security updates on ganeti-test
- 08:59 btullis: running build-production-images on build2001 for T344910
- 08:53 godog: +50 to prometheus eqiad k8s-staging
- 08:45 jayme: restarting confd fleet wide
- 08:45 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-eqiad
- 08:43 jayme: restarting primary lvs in codfw, eqsin, ulsfo
- 08:38 jayme: restarted secondary lvs in codfw, eqsin, ulsfo
- 08:24 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.26 refs T343728
- 07:57 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.eqsin.wmnet on all recursors
- 07:56 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.eqsin.wmnet on all recursors
- 07:56 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.ulsfo.wmnet on all recursors
- 07:56 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.ulsfo.wmnet on all recursors
- 07:56 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.codfw.wmnet on all recursors
- 07:56 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.codfw.wmnet on all recursors
- 07:44 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host debmonitor2003.codfw.wmnet
- 07:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
- 07:32 hashar: Backport & config deployment window completed.
- 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
- 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
- 07:13 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service on MediaWiki - rollout #4 (T341445) (duration: 10m 17s)
- 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1001.wikimedia.org
- 07:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt1001.wikimedia.org
- 07:06 kartik@deploy1002: abi and kartik: Continuing with sync
- 07:04 kartik@deploy1002: abi and kartik: Backport for Enable MinT translation service on MediaWiki - rollout #4 (T341445) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:02 kartik@deploy1002: Started scap: Backport for Enable MinT translation service on MediaWiki - rollout #4 (T341445)
- 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Pre swichover tasks
- 06:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Pre swichover tasks
- 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Pre swichover tasks
- 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Pre swichover tasks
- 05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Pre swichover tasks
- 05:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Pre swichover tasks
- 05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Pre swichover tasks
- 05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Pre swichover tasks
- 05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Pre swichover tasks
- 05:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Pre swichover tasks
- 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2013.codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Pre swichover tasks
- 05:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2013.codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Pre swichover tasks
- 05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Pre swichover tasks
- 05:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Pre swichover tasks
- 05:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc[2011,2014].codfw.wmnet,pc1011.eqiad.wmnet with reason: Pre swichover tasks
- 05:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc[2011,2014].codfw.wmnet,pc1011.eqiad.wmnet with reason: Pre swichover tasks
- 03:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 03:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 03:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 02:58 rzl@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 02:58 rzl@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 02:57 rzl@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 02:56 rzl@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 02:54 rzl@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 01:36 urandom: starting RESTBase/Cassandra node rebuilds, cassandra-c/row D — T331713
2023-09-13
- 23:06 urandom: starting Cassandra node rebuilds, restbase/row D — T331713
- 22:57 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
- 21:50 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1128.eqiad.wmnet with reason: HW issues
- 21:50 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1128.eqiad.wmnet with reason: HW issues
- 21:50 denisse: downtiming db1128
- 21:49 denisse@cumin1001: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P52504 and previous config saved to /var/cache/conftool/dbconfig/20230913-214930-denisse.json
- 21:48 denisse: depooling db1128
- 21:35 bking@deploy1002: Finished deploy [wdqs/wdqs@3e0a913]: 0.3.129 use allowlist T344284 (duration: 11m 27s)
- 21:28 eileen: civicrm upgraded from 6b247288 to 9d34ed9b
- 21:24 bking@deploy1002: Started deploy [wdqs/wdqs@3e0a913]: 0.3.129 use allowlist T344284
- 21:22 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: 0.3.129 use allowlist T344284 (duration: 00m 59s)
- 21:21 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: 0.3.129 use allowlist T344284
- 19:44 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon1003.wikimedia.org with OS bookworm
- 19:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
- 19:40 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
- 19:34 eileen: civicrm upgraded from 80aee570 to 6b247288
- 19:24 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
- 19:21 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
- 19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
- 19:09 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bookworm
- 19:09 urandom: initiating rebuild of restbase1027-a & restbase1033-a
- 19:08 urandom: initiating rebuild of restbase1026-a
- 19:00 urandom: initiating rebuild of restbase1025-a
- 18:51 urandom: initiating rebuild of restbase1018-a
- 18:49 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4052.ulsfo.wmnet with OS bookworm
- 18:42 urandom: stopping bootstrap of restbase1030-c — T331713
- 18:38 godog: run schema migrations for librenms on m1 (backdated, started ~1h ago)
- 18:33 urandom: restarting restbase service (restbase1031) — T331713
- 18:19 urandom: resuming bootstrap of restbase1030-c —
- 18:05 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bookworm
- 17:45 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
- 17:42 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
- 17:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
- 17:22 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bookworm
- 16:34 denisse@deploy1002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.8.2 - T344136 (duration: 00m 16s)
- 16:34 denisse@deploy1002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.8.2 - T344136
- 16:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on vrts1002.eqiad.wmnet with reason: Testing
- 16:04 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on vrts1002.eqiad.wmnet with reason: Testing
- 16:04 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bookworm
- 15:41 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
- 15:38 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
- 15:34 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 15:34 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 15:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 15:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 15:26 jayme: re-enabled puppet on all k8s control planes
- 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-codfw
- 15:19 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bookworm
- 15:19 denisse: Start reimage of netmon2002
- 15:17 denisse: Starting LibreNMS upgrade in codfw.
- 15:14 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 15:04 jayme: stopped puppet on all k8s control planes for 956842 rollout
- 15:01 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 15:01 hnowlan: repooling cp2037 and enabling puppet on A:cp
- 14:56 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 14:55 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 14:52 hnowlan: disable puppet on A:cp
- 14:51 hnowlan: depooled service=ats-be,name=cp2037.codfw.wmnet
- 14:51 jayme: updated kubernetes-* packages fleet wide to 1.23.14-3 - T329826
- 14:50 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 14:41 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 14:39 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP towards puppetised nftables firewall
- 14:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP towards puppetised nftables firewall
- 14:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
- 14:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:25 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:17 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:17 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:10 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:10 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 14:08 hnowlan: stopping cassandra on restbase1030-c
- 13:52 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-codfw
- 13:34 Lucas_WMDE: UTC afternoon backport+config window done
- 13:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset (T318272) (duration: 15m 42s)
- 13:27 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and d3r1ck01: Continuing with sync
- 13:20 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and d3r1ck01: Backport for rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset (T318272) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:18 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset (T318272)
- 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52499 and previous config saved to /var/cache/conftool/dbconfig/20230913-122323-ladsgroup.json
- 12:17 godog: pool only titan hosts for thanos-web and thanos-query services - T341488
- 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52498 and previous config saved to /var/cache/conftool/dbconfig/20230913-120818-ladsgroup.json
- 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52497 and previous config saved to /var/cache/conftool/dbconfig/20230913-115314-ladsgroup.json
- 11:30 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 11:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 11:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 11:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 11:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 11:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
- 11:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:18 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52495 and previous config saved to /var/cache/conftool/dbconfig/20230913-111834-arnaudb.json
- 11:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 11:15 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:15 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 11:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2002.codfw.wmnet with OS bookworm
- 10:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
- 10:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
- 10:49 jayme: imported kubernetes_1.23.14-3 to bullseye-wikimedia component/kubernetes123 - T329826
- 10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 10:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
- 10:40 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1002.eqiad.wmnet with OS bookworm
- 10:34 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
- 10:34 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host titan2002.codfw.wmnet with OS bookworm
- 10:29 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
- 10:28 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 10:28 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:27 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
- 10:26 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host titan2002.codfw.wmnet with OS bookworm
- 10:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
- 10:21 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
- 10:11 claime: set/pooled=no; selector: name=mw2444.codfw.wmnet - T345884
- 10:10 cgoubert@cumin1001: conftool action : set/pooled=no; selector: name=mw2444.codfw.wmnet
- 10:10 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 10:06 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
- 10:06 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
- 10:06 aklapper@deploy1002: Finished scap: Backport for Revert "EntityId: Hard-deprecate Serializable methods" (duration: 08m 49s)
- 10:06 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1002.eqiad.wmnet with OS bookworm
- 10:06 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan2002.codfw.wmnet with OS bookworm
- 09:59 aklapper@deploy1002: aklapper and jnuche: Continuing with sync
- 09:59 aklapper@deploy1002: aklapper and jnuche: Backport for Revert "EntityId: Hard-deprecate Serializable methods" synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 09:57 aklapper@deploy1002: Started scap: Backport for Revert "EntityId: Hard-deprecate Serializable methods"
- 09:51 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 09:48 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 09:35 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 09:35 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 09:34 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 09:34 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 09:16 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
- 09:14 aklapper@deploy1002: backport Cancelled
- 09:14 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
- 09:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2001.codfw.wmnet with OS bookworm
- 08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 08:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1001.eqiad.wmnet with OS bookworm
- 08:46 claime: Running puppet on cp-text P:trafficserver::backend - T290536
- 08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
- 08:30 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
- 08:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
- 08:25 aklapper@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.26 refs T343728 (duration: 06m 46s)
- 08:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
- 08:18 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.26 refs T343728
- 08:14 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2001.codfw.wmnet with OS bookworm
- 08:14 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host titan2001.codfw.wmnet with OS bookworm
- 08:08 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
- 08:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1001.eqiad.wmnet with OS bookworm
- 08:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
- 07:56 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
- 07:54 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
- 07:53 vgutierrez: repool cp1075 && cp1076
- 07:51 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1001.eqiad.wmnet with OS bookworm
- 07:51 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan2001.codfw.wmnet,service=thanos-web
- 07:46 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2001.codfw.wmnet with OS bookworm
- 07:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52491 and previous config saved to /var/cache/conftool/dbconfig/20230913-074602-arnaudb.json
- 07:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 07:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 07:44 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.codfw.wmnet,service=thanos-web
- 07:43 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.codfdw.wmnet,service=thanos-web
- 07:43 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan2001.codfdw.wmnet,service=thanos-web
- 07:43 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1001.eqiad.wmnet,service=thanos-web
- 07:43 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe1004.eqiad.wmnet,service=thanos-web
- 07:42 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
- 07:39 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
- 06:06 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Running again following connection refused errors from kubemaster (duration: 07m 24s)
- 05:55 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable source maps on group0 wikis attempt 2 (duration: 07m 37s)
- 05:40 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable source maps on group0 wikis T47514 (duration: 07m 14s)
- 05:15 tstarling@deploy1002: Synchronized wmf-config/etcd.php: Remove PHP 7.2 fallback for array_key_first g 956364 (duration: 07m 03s)
- 04:35 hmonroy@deploy1002: Finished scap: Backport for Do not enable entire OOUI in PHP on page load (T345414) (duration: 07m 58s)
- 04:29 hmonroy@deploy1002: hmonroy and jdlrobson: Continuing with sync
- 04:29 hmonroy@deploy1002: hmonroy and jdlrobson: Backport for Do not enable entire OOUI in PHP on page load (T345414) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 04:27 hmonroy@deploy1002: Started scap: Backport for Do not enable entire OOUI in PHP on page load (T345414)
- 04:26 hmonroy@deploy1002: Finished scap: Backport for Do not enable entire OOUI in PHP on page load (T345414) (duration: 09m 56s)
- 04:19 hmonroy@deploy1002: hmonroy and jdlrobson: Continuing with sync
- 04:17 hmonroy@deploy1002: hmonroy and jdlrobson: Backport for Do not enable entire OOUI in PHP on page load (T345414) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 04:16 hmonroy@deploy1002: Started scap: Backport for Do not enable entire OOUI in PHP on page load (T345414)
2023-09-12
- 23:14 brett: Upload trafficserver_9.2.1-1wm2_amd64 to bookworm-wikimedia
- 23:09 eileen: config revision changed from 2efd8142 to eb7931ca add is_create_activities to bounce fetch job
- 21:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 21:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 21:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52486 and previous config saved to /var/cache/conftool/dbconfig/20230912-211128-arnaudb.json
- 21:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1001.eqiad.wmnet
- 21:04 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader1001.eqiad.wmnet
- 20:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P52485 and previous config saved to /var/cache/conftool/dbconfig/20230912-205621-arnaudb.json
- 20:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2001.codfw.wmnet
- 20:43 cjming: end of UTC late backport window
- 20:43 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader2001.codfw.wmnet
- 20:42 inflatador: rebooting search-loader2001.codfw.wmnet T344671
- 20:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P52484 and previous config saved to /var/cache/conftool/dbconfig/20230912-204115-arnaudb.json
- 20:39 cjming@deploy1002: Finished scap: Backport for Make the new stream name consistent with convention (duration: 09m 24s)
- 20:33 cjming@deploy1002: sharvaniharan and cjming: Continuing with sync
- 20:31 cjming@deploy1002: sharvaniharan and cjming: Backport for Make the new stream name consistent with convention synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:30 cjming@deploy1002: Started scap: Backport for Make the new stream name consistent with convention
- 20:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52483 and previous config saved to /var/cache/conftool/dbconfig/20230912-202609-arnaudb.json
- 20:25 cjming@deploy1002: Finished scap: Backport for Reduce initial payload of Phonos styles (T345414) (duration: 12m 06s)
- 20:22 eileen: civicrm upgraded from 5b7b2b3e to 80aee570
- 20:19 cjming@deploy1002: cjming and samtar: Continuing with sync
- 20:15 cjming@deploy1002: cjming and samtar: Backport for Reduce initial payload of Phonos styles (T345414) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:13 cjming@deploy1002: Started scap: Backport for Reduce initial payload of Phonos styles (T345414)
- 19:43 eileen: civicrm upgraded from 771fcde3 to 5b7b2b3e
- 19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ssw1 old irb int dns - cmooney@cumin1001"
- 19:28 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ssw1 old irb int dns - cmooney@cumin1001"
- 19:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 19:17 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:15 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 17:50 sukhe: run authdns-update to remove nsa.wikimedia.org
- 16:28 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2005-dev.codfw.wmnet with OS bookworm
- 15:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1004.eqiad.wmnet
- 15:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1004.eqiad.wmnet
- 15:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2003.codfw.wmnet
- 15:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2003.codfw.wmnet
- 15:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 15:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 15:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 15:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 15:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1056.eqiad.wmnet']
- 15:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 15:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1055.eqiad.wmnet']
- 15:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1056.eqiad.wmnet']
- 15:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1055.eqiad.wmnet']
- 14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1051.eqiad.wmnet']
- 14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1050.eqiad.wmnet']
- 14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1054.eqiad.wmnet']
- 14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1053.eqiad.wmnet']
- 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1052.eqiad.wmnet']
- 14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1049.eqiad.wmnet']
- 14:57 godog: add 30G to prometheus@services and 300G to prometheus@ops (codfw)
- 14:57 dancy@deploy1002: Installation of scap version "4.61.0" completed for 595 hosts
- 14:56 dancy@deploy1002: Installing scap version "4.61.0" for 595 hosts
- 14:55 dancy@deploy1002: Installing scap version "4.61.0" for 596 hosts
- 14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1051.eqiad.wmnet']
- 14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1050.eqiad.wmnet']
- 14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1052.eqiad.wmnet']
- 14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1053.eqiad.wmnet']
- 14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1054.eqiad.wmnet']
- 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1044.eqiad.wmnet']
- 14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1049.eqiad.wmnet']
- 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1046.eqiad.wmnet']
- 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1047.eqiad.wmnet']
- 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1043.eqiad.wmnet']
- 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1048.eqiad.wmnet']
- 14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1045.eqiad.wmnet']
- 14:46 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
- 14:42 moritzm: installing Linux 6.1.52 on Bookworm hosts
- 14:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1048.eqiad.wmnet']
- 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1047.eqiad.wmnet']
- 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1046.eqiad.wmnet']
- 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1045.eqiad.wmnet']
- 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1044.eqiad.wmnet']
- 14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1043.eqiad.wmnet']
- 14:39 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
- 14:39 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1042.eqiad.wmnet']
- 14:39 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1041.eqiad.wmnet']
- 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: furud.codfw.wmnet
- 14:38 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: furud.codfw.wmnet
- 14:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1040.eqiad.wmnet']
- 14:35 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1039.eqiad.wmnet']
- 14:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1038.eqiad.wmnet']
- 14:33 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1037.eqiad.wmnet']
- 14:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1042.eqiad.wmnet']
- 14:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1041.eqiad.wmnet']
- 14:30 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
- 14:30 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
- 14:30 moritzm: installing libssh2 security updates#
- 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1040.eqiad.wmnet']
- 14:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1035.eqiad.wmnet']
- 14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1039.eqiad.wmnet']
- 14:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1034.eqiad.wmnet']
- 14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1038.eqiad.wmnet']
- 14:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1037.eqiad.wmnet']
- 14:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
- 14:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1033.eqiad.wmnet']
- 14:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
- 14:23 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1032.eqiad.wmnet']
- 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
- 14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
- 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1031.eqiad.wmnet']
- 14:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1035.eqiad.wmnet']
- 14:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1034.eqiad.wmnet']
- 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1028.eqiad.wmnet']
- 14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1030.eqiad.wmnet']
- 14:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1033.eqiad.wmnet']
- 14:15 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1027.eqiad.wmnet']
- 14:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1032.eqiad.wmnet']
- 14:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1031.eqiad.wmnet']
- 14:10 sukhe: enable puppet on dns-rec to progessively roll out nsa->ns2 updates
- 14:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1030.eqiad.wmnet']
- 14:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
- 14:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
- 14:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1028.eqiad.wmnet']
- 14:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1027.eqiad.wmnet']
- 14:02 sukhe: [correction] enable puppet on dns6001 to test nsa removal
- 14:02 sukhe: enable puppet on doh6001 to test nsa removal
- 14:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:57 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:56 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 13:50 sukhe: disable puppet on A:dns-rec
- 13:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:46 TheresNoTime: UTC afternoon backport window closed
- 13:45 samtar@deploy1002: Finished scap: Backport for Reduce initial payload of Phonos styles (T345414) (duration: 08m 59s)
- 13:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343198)', diff saved to https://phabricator.wikimedia.org/P52477 and previous config saved to /var/cache/conftool/dbconfig/20230912-134451-arnaudb.json
- 13:40 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 13:39 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 13:39 samtar@deploy1002: samtar: Continuing with sync
- 13:38 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 13:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 13:38 samtar@deploy1002: samtar: Backport for Reduce initial payload of Phonos styles (T345414) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:36 samtar@deploy1002: Started scap: Backport for Reduce initial payload of Phonos styles (T345414)
- 13:36 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 13:35 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 13:31 taavi@deploy1002: Finished scap: Backport for Enable Parsoid support for Kartographer on enwiki (T342871) (duration: 26m 05s)
- 13:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P52476 and previous config saved to /var/cache/conftool/dbconfig/20230912-132944-arnaudb.json
- 13:19 taavi@deploy1002: ihurbain and taavi: Continuing with sync
- 13:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P52475 and previous config saved to /var/cache/conftool/dbconfig/20230912-131438-arnaudb.json
- 13:10 moritzm: installing grub2 updates from Bullseye point release
- 13:06 taavi@deploy1002: ihurbain and taavi: Backport for Enable Parsoid support for Kartographer on enwiki (T342871) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:05 taavi@deploy1002: Started scap: Backport for Enable Parsoid support for Kartographer on enwiki (T342871)
- 12:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343198)', diff saved to https://phabricator.wikimedia.org/P52474 and previous config saved to /var/cache/conftool/dbconfig/20230912-125932-arnaudb.json
- 12:40 brouberol@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
- 12:24 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
- 12:15 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 12:15 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 12:15 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 12:14 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 12:12 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudservices1004.wikimedia.org
- 12:11 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:11 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
- 12:09 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
- 12:07 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 11:59 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1004.wikimedia.org
- 11:57 godog: pool thanos[12]001 for thanos.w.o - T341999
- 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52473 and previous config saved to /var/cache/conftool/dbconfig/20230912-114711-root.json
- 11:43 godog: pool titan hosts alongside thanos-fe for thanos-query / thanos-web services - T341999
- 11:42 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1001.eqiad.wmnet,service=thanos-web
- 11:42 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1002.eqiad.wmnet,service=thanos-web
- 11:41 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 7 hosts with reason: Mute initial failures of hadoop-hdfs-datanode.service
- 11:41 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 7 hosts with reason: Mute initial failures of hadoop-hdfs-datanode.service
- 11:40 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1002.eqiad.wmnet,service=thanos-web
- 11:40 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=titan1002.eqiad.wmnet,service=thanos-web
- 11:39 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1001.eqiad.wmnet,service=thanos-web
- 11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan2002.codfw.wmnet
- 11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan2001.codfw.wmnet
- 11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan1002.eqiad.wmnet
- 11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan1001.eqiad.wmnet
- 11:36 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan*
- 11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan2002.codfw.wmnet
- 11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan2001.codfw.wmnet
- 11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan1002.eqiad.wmnet
- 11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan1001.eqiad.wmnet
- 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52472 and previous config saved to /var/cache/conftool/dbconfig/20230912-113207-root.json
- 11:18 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudservices1004.wikimedia.org
- 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52471 and previous config saved to /var/cache/conftool/dbconfig/20230912-111702-root.json
- 11:03 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 11:03 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 11:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52470 and previous config saved to /var/cache/conftool/dbconfig/20230912-110157-root.json
- 10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
- 10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52468 and previous config saved to /var/cache/conftool/dbconfig/20230912-104652-root.json
- 10:45 moritzm: rebalance Ganeti cluster in eqiad/C following node reboots
- 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
- 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
- 10:37 taavi@cumin1001: conftool action : set/pooled=yes:weight=10; selector: cluster=cloudweb
- 10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
- 10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52467 and previous config saved to /var/cache/conftool/dbconfig/20230912-103148-root.json
- 10:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
- 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
- 10:21 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 10:21 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
- 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 3%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52466 and previous config saved to /var/cache/conftool/dbconfig/20230912-101643-root.json
- 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.pki.restart-reboot (exit_code=0) rolling reboot on A:pki
- 10:13 moritzm: disabled nginx/puppetdb/postgresql/microservice on puppetdb1002/2002 to ensure nothing hits the old endpoints anymore
- 10:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
- 10:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
- 10:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 10:08 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 10:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres on old nodes to ensure nothing hits them anyway
- 10:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres on old nodes to ensure nothing hits them anyway
- 10:02 hnowlan: enabling puppet on A:cp
- 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52465 and previous config saved to /var/cache/conftool/dbconfig/20230912-100138-root.json
- 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
- 09:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
- 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
- 09:52 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
- 09:52 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
- 09:32 hnowlan: disabled puppet on A:cp
- 09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52464 and previous config saved to /var/cache/conftool/dbconfig/20230912-092639-arnaudb.json
- 09:26 jmm@cumin2002: END (FAIL) - Cookbook sre.pki.restart-reboot (exit_code=99) rolling reboot on A:pki
- 09:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 09:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52463 and previous config saved to /var/cache/conftool/dbconfig/20230912-092618-arnaudb.json
- 09:26 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
- 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
- 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
- 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 09:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P52461 and previous config saved to /var/cache/conftool/dbconfig/20230912-091112-arnaudb.json
- 08:58 claime: Running puppet on cp-text P:trafficserver::backend - T341780
- 08:58 claime: Sending 5% of global traffic to mw-on-k8s - T341780
- 08:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P52460 and previous config saved to /var/cache/conftool/dbconfig/20230912-085606-arnaudb.json
- 08:51 claime: mw-api-ext, mw-web: Raise total replicas to 14 - T341780
- 08:51 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 08:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 08:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 08:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 08:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 08:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 08:50 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 08:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 08:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 08:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 08:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1028.eqiad.wmnet
- 08:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52459 and previous config saved to /var/cache/conftool/dbconfig/20230912-084059-arnaudb.json
- 08:39 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.26 refs T343728
- 08:38 moritzm: rebalance Ganeti cluster in codfw/C following node replacement
- 08:24 oblivian@deploy1002: Finished scap: Backport for Replace calls to wfHostname with clusterconfig ones (duration: 09m 16s)
- 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
- 08:18 oblivian@deploy1002: oblivian: Continuing with sync
- 08:17 oblivian@deploy1002: oblivian: Backport for Replace calls to wfHostname with clusterconfig ones synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 08:15 oblivian@deploy1002: Started scap: Backport for Replace calls to wfHostname with clusterconfig ones
- 08:13 oblivian@deploy1002: Finished scap: Backport for ClusterConfig: also allow to return hostname, Enable PageNotice on enwiktionary beta (T61245) (duration: 45m 23s)
- 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
- 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
- 08:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
- 07:58 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1156.eqiad.wmnet
- 07:58 oblivian@deploy1002: tto and oblivian: Continuing with sync
- 07:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
- 07:56 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1156.eqiad.wmnet
- 07:56 oblivian@deploy1002: tto and oblivian: Backport for ClusterConfig: also allow to return hostname, Enable PageNotice on enwiktionary beta (T61245) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:53 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1155.eqiad.wmnet
- 07:51 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1155.eqiad.wmnet
- 07:51 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1154.eqiad.wmnet
- 07:49 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1154.eqiad.wmnet
- 07:45 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1153.eqiad.wmnet
- 07:43 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1153.eqiad.wmnet
- 07:36 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org
- 07:28 oblivian@deploy1002: Started scap: Backport for ClusterConfig: also allow to return hostname, Enable PageNotice on enwiktionary beta (T61245)
- 07:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
- 07:23 oblivian@deploy1002: Finished scap: Backport for update noc README, Use ClusterConfig (duration: 13m 46s)
- 07:17 oblivian@deploy1002: oblivian: Continuing with sync
- 07:11 oblivian@deploy1002: oblivian: Backport for update noc README, Use ClusterConfig synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:09 oblivian@deploy1002: Started scap: Backport for update noc README, Use ClusterConfig
- 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
- 06:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T343198)', diff saved to https://phabricator.wikimedia.org/P52456 and previous config saved to /var/cache/conftool/dbconfig/20230912-062353-arnaudb.json
- 06:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 06:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 06:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343198)', diff saved to https://phabricator.wikimedia.org/P52455 and previous config saved to /var/cache/conftool/dbconfig/20230912-062332-arnaudb.json
- 06:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P52454 and previous config saved to /var/cache/conftool/dbconfig/20230912-060825-arnaudb.json
- 05:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P52453 and previous config saved to /var/cache/conftool/dbconfig/20230912-055319-arnaudb.json
- 05:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2014.codfw.wmnet with OS bullseye
- 05:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343198)', diff saved to https://phabricator.wikimedia.org/P52452 and previous config saved to /var/cache/conftool/dbconfig/20230912-053813-arnaudb.json
- 05:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2014.codfw.wmnet with reason: host reimage
- 05:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2014.codfw.wmnet with reason: host reimage
- 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1119 with Debian Bookworm in s1 with just 10% T339185', diff saved to https://phabricator.wikimedia.org/P52450 and previous config saved to /var/cache/conftool/dbconfig/20230912-051753-marostegui.json
- 05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2158', diff saved to https://phabricator.wikimedia.org/P52449 and previous config saved to /var/cache/conftool/dbconfig/20230912-051725-root.json
- 05:11 moritzm: installing aom security updates
- 05:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2014.codfw.wmnet with OS bullseye
- 05:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
- 05:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52448 and previous config saved to /var/cache/conftool/dbconfig/20230912-050033-arnaudb.json
- 05:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 05:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 05:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 04:59 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 04:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52447 and previous config saved to /var/cache/conftool/dbconfig/20230912-045944-arnaudb.json
- 04:56 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
- 04:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P52446 and previous config saved to /var/cache/conftool/dbconfig/20230912-044437-arnaudb.json
- 04:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P52445 and previous config saved to /var/cache/conftool/dbconfig/20230912-042931-arnaudb.json
- 04:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52444 and previous config saved to /var/cache/conftool/dbconfig/20230912-041425-arnaudb.json
- 03:58 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.23, 1.41.0-wmf.24 (duration: 02m 30s)
- 03:56 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.26 refs T343728 (duration: 53m 18s)
- 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.26 refs T343728
- 02:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1002.eqiad.wmnet with OS bookworm
- 02:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:48 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
- 02:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
- 02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
- 01:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1001.eqiad.wmnet with OS bookworm
- 01:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 01:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 01:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
- 01:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
- 01:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
- 00:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52443 and previous config saved to /var/cache/conftool/dbconfig/20230912-001715-arnaudb.json
- 00:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
- 00:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
- 00:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52442 and previous config saved to /var/cache/conftool/dbconfig/20230912-001654-arnaudb.json
- 00:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P52441 and previous config saved to /var/cache/conftool/dbconfig/20230912-000148-arnaudb.json
2023-09-11
- 23:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P52440 and previous config saved to /var/cache/conftool/dbconfig/20230911-234641-arnaudb.json
- 23:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52439 and previous config saved to /var/cache/conftool/dbconfig/20230911-233135-arnaudb.json
- 23:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T343198)', diff saved to https://phabricator.wikimedia.org/P52438 and previous config saved to /var/cache/conftool/dbconfig/20230911-231131-arnaudb.json
- 23:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 23:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 23:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 23:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 23:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343198)', diff saved to https://phabricator.wikimedia.org/P52437 and previous config saved to /var/cache/conftool/dbconfig/20230911-231054-arnaudb.json
- 22:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P52436 and previous config saved to /var/cache/conftool/dbconfig/20230911-225548-arnaudb.json
- 22:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1002.eqiad.wmnet with OS bookworm
- 22:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1001.eqiad.wmnet with OS bookworm
- 22:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P52435 and previous config saved to /var/cache/conftool/dbconfig/20230911-224042-arnaudb.json
- 22:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343198)', diff saved to https://phabricator.wikimedia.org/P52434 and previous config saved to /var/cache/conftool/dbconfig/20230911-222536-arnaudb.json
- 21:33 cwhite: update grafana to 9.4.14 on grafana1002 T345362
- 21:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
- 21:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
- 21:19 sbassett: Deployed security fix for T345693
- 20:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan1002.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['titan1001.eqiad.wmnet']
- 20:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan1001.eqiad.wmnet']
- 20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['titan1001.eqiad.wmnet']
- 20:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan1001.eqiad.wmnet']
- 20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['titan1001.eqiad.wmnet']
- 20:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan1001.eqiad.wmnet']
- 20:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan1001.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host titan1002.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host titan1001.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host titan1001
- 20:17 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host titan1001
- 20:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host titan1002
- 20:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:13 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host titan1002
- 20:13 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host titan1001
- 20:13 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host titan1001
- 20:13 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:13 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
- 20:12 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
- 20:10 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 20:10 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:09 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
- 20:09 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
- 20:05 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T337310)', diff saved to https://phabricator.wikimedia.org/P52432 and previous config saved to /var/cache/conftool/dbconfig/20230911-194332-ladsgroup.json
- 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P52431 and previous config saved to /var/cache/conftool/dbconfig/20230911-192826-ladsgroup.json
- 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P52430 and previous config saved to /var/cache/conftool/dbconfig/20230911-191320-ladsgroup.json
- 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T337310)', diff saved to https://phabricator.wikimedia.org/P52429 and previous config saved to /var/cache/conftool/dbconfig/20230911-185813-ladsgroup.json
- 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T337310)', diff saved to https://phabricator.wikimedia.org/P52428 and previous config saved to /var/cache/conftool/dbconfig/20230911-184231-ladsgroup.json
- 18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
- 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
- 18:33 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bullseye
- 18:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
- 18:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
- 18:11 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
- 18:08 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
- 18:00 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1030.eqiad.wmnet with OS bullseye
- 17:59 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
- 17:58 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
- 17:53 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bullseye
- 17:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
- 17:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
- 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T337310)', diff saved to https://phabricator.wikimedia.org/P52427 and previous config saved to /var/cache/conftool/dbconfig/20230911-174321-ladsgroup.json
- 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P52426 and previous config saved to /var/cache/conftool/dbconfig/20230911-172815-ladsgroup.json
- 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P52425 and previous config saved to /var/cache/conftool/dbconfig/20230911-171309-ladsgroup.json
- 17:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1030.eqiad.wmnet with reason: host reimage
- 17:06 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1030.eqiad.wmnet with reason: host reimage
- 16:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T337310)', diff saved to https://phabricator.wikimedia.org/P52424 and previous config saved to /var/cache/conftool/dbconfig/20230911-165802-ladsgroup.json
- 16:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1055.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1056.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1054.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2096 (T337310)', diff saved to https://phabricator.wikimedia.org/P52423 and previous config saved to /var/cache/conftool/dbconfig/20230911-164249-ladsgroup.json
- 16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2096.codfw.wmnet with reason: Maintenance
- 16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2096.codfw.wmnet with reason: Maintenance
- 16:41 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
- 16:32 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
- 16:31 denisse@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netmon2002.wikimedia.org with OS bookworm
- 16:28 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
- 16:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1056.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1054.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1055.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:16 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
- 16:12 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1152.eqiad.wmnet
- 16:10 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1152.eqiad.wmnet
- 16:10 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
- 16:08 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2005-dev.codfw.wmnet with OS bookworm
- 16:07 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1151.eqiad.wmnet
- 16:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1047.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:06 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
- 16:05 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1151.eqiad.wmnet
- 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1050.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1052.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1051.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1049.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1046.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1053.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1048.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1045.mgmt.eqiad.wmnet with reboot policy FORCED
- 16:03 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
- 16:01 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1150.eqiad.wmnet
- 16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
- 16:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
- 15:59 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1150.eqiad.wmnet
- 15:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1047.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:48 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:48 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1047 - jclark@cumin1001"
- 15:47 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1047 - jclark@cumin1001"
- 15:45 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 15:44 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1149.eqiad.wmnet
- 15:43 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bookworm
- 15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T343198)', diff saved to https://phabricator.wikimedia.org/P52421 and previous config saved to /var/cache/conftool/dbconfig/20230911-154327-arnaudb.json
- 15:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 15:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 15:41 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1149.eqiad.wmnet
- 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1048.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1046.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1045.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1050.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1049.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1053.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1051.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1052.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1040.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1043.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1037.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1044.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1039.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1041.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1042.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1038.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T337310)', diff saved to https://phabricator.wikimedia.org/P52420 and previous config saved to /var/cache/conftool/dbconfig/20230911-152456-ladsgroup.json
- 15:23 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet2005-dev.codfw.wmnet with OS bookworm
- 15:21 jnuche@deploy1002: Installation of scap version "4.60.0" completed for 595 hosts
- 15:20 jnuche@deploy1002: Installing scap version "4.60.0" for 595 hosts
- 15:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:18 jnuche@deploy1002: Installing scap version "4.60.0" for 595 hosts
- 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P52419 and previous config saved to /var/cache/conftool/dbconfig/20230911-150950-ladsgroup.json
- 15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1044.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1043.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1042.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1041.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1040.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1039.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1038.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1037.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:04 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:03 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1031.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:56 brouberol@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1149.eqiad.wmnet
- 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P52418 and previous config saved to /var/cache/conftool/dbconfig/20230911-145443-ladsgroup.json
- 14:54 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1149.eqiad.wmnet
- 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T337310)', diff saved to https://phabricator.wikimedia.org/P52417 and previous config saved to /var/cache/conftool/dbconfig/20230911-143937-ladsgroup.json
- 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1220 (T337310)', diff saved to https://phabricator.wikimedia.org/P52416 and previous config saved to /var/cache/conftool/dbconfig/20230911-143102-ladsgroup.json
- 14:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1220.eqiad.wmnet with reason: Maintenance
- 14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1220.eqiad.wmnet with reason: Maintenance
- 14:19 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2005-dev.codfw.wmnet with OS bookworm
- 13:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2002.codfw.wmnet
- 13:55 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf2002.codfw.wmnet
- 13:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1002.eqiad.wmnet
- 13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 13:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T337310)', diff saved to https://phabricator.wikimedia.org/P52414 and previous config saved to /var/cache/conftool/dbconfig/20230911-135520-ladsgroup.json
- 13:49 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf1002.eqiad.wmnet
- 13:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1001.eqiad.wmnet
- 13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf1001.eqiad.wmnet
- 13:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
- 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P52413 and previous config saved to /var/cache/conftool/dbconfig/20230911-134013-ladsgroup.json
- 13:40 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service in more wikis - rollout #3 (T341445) (duration: 11m 18s)
- 13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
- 13:33 kartik@deploy1002: kartik and abi: Continuing with sync
- 13:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2001.codfw.wmnet
- 13:30 kartik@deploy1002: kartik and abi: Backport for Enable MinT translation service in more wikis - rollout #3 (T341445) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:28 kartik@deploy1002: Started scap: Backport for Enable MinT translation service in more wikis - rollout #3 (T341445)
- 13:26 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace"" (duration: 08m 04s)
- 13:26 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf2001.codfw.wmnet
- 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P52412 and previous config saved to /var/cache/conftool/dbconfig/20230911-132507-ladsgroup.json
- 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P52411 and previous config saved to /var/cache/conftool/dbconfig/20230911-132210-root.json
- 13:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas3001.wikimedia.org
- 13:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas3001.wikimedia.org - ayounsi@cumin1001"
- 13:20 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Continuing with sync
- 13:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Revert "Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace"" synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:19 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas3001.wikimedia.org - ayounsi@cumin1001"
- 13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas3001.wikimedia.org on all recursors
- 13:19 ayounsi@cumin1001: START - Cookbook sre.dns.wipe-cache atlas3001.wikimedia.org on all recursors
- 13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1001"
- 13:18 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1001"
- 13:18 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace""
- 13:16 lucaswerkmeister-wmde@deploy1002: Sync cancelled.
- 13:16 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
- 13:16 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host atlas3001.wikimedia.org
- 13:11 lucaswerkmeister-wmde@deploy1002: func and lucaswerkmeister-wmde: Backport for Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace" (T340697) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T337310)', diff saved to https://phabricator.wikimedia.org/P52409 and previous config saved to /var/cache/conftool/dbconfig/20230911-131001-ladsgroup.json
- 13:09 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace" (T340697)
- 13:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet
- 13:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
- 12:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet
- 12:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
- 12:38 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:37 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
- 12:37 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
- 12:30 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T337310)', diff saved to https://phabricator.wikimedia.org/P52408 and previous config saved to /var/cache/conftool/dbconfig/20230911-122535-ladsgroup.json
- 12:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1137.eqiad.wmnet with reason: Maintenance
- 12:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1137.eqiad.wmnet with reason: Maintenance
- 12:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
- 12:21 moritzm: restarting apache/FPM on mediawiki canaries
- 12:18 moritzm: installing libssh2 security updates
- 12:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
- 12:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
- 12:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
- 12:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
- 12:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
- 11:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
- 11:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
- 11:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet
- 11:42 Amir1: setting binlog format to STATEMENT in x1 eqiad and codfw masters (T337310)
- 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
- 11:42 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet
- 11:41 claime: Rebooting poolcounter2003.codfw.wmnet
- 11:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
- 11:32 isaranto@deploy1002: Finished scap: Backport for ores-extension: enable lw in enwiki and wikidata (T342115) (duration: 23m 46s)
- 11:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet
- 11:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet
- 11:26 isaranto@deploy1002: isaranto: Continuing with sync
- 11:26 claime: Rebooting poolcounter2004.codfw.wmnet
- 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
- 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
- 11:10 isaranto@deploy1002: isaranto: Backport for ores-extension: enable lw in enwiki and wikidata (T342115) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 11:09 isaranto@deploy1002: Started scap: Backport for ores-extension: enable lw in enwiki and wikidata (T342115)
- 11:06 volans: installed spicearck v7.2.2 on both cumin hosts
- 10:59 volans: uploaded spicerack_7.2.2 to apt.wikimedia.org bullseye-wikimedia
- 10:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1003.wikimedia.org with OS bullseye
- 10:27 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 10:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
- 10:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
- 10:14 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 10:03 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab1003.wikimedia.org with OS bullseye
- 09:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
- 09:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
- 09:53 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
- 09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
- 09:32 claime: rearmed keyholder on deploy2002.codfw.wmnet
- 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52405 and previous config saved to /var/cache/conftool/dbconfig/20230911-092650-root.json
- 09:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
- 09:25 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:24 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: T342361 - testing blazegraph startup script refactor
- 09:24 gehel@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: T342361 - testing blazegraph startup script refactor
- 09:18 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
- 09:18 claime: rebooting deploy2002.codfw.wmnet
- 09:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 09:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 09:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52404 and previous config saved to /var/cache/conftool/dbconfig/20230911-091817-arnaudb.json
- 09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52403 and previous config saved to /var/cache/conftool/dbconfig/20230911-091145-root.json
- 09:08 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P52402 and previous config saved to /var/cache/conftool/dbconfig/20230911-090310-arnaudb.json
- 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52401 and previous config saved to /var/cache/conftool/dbconfig/20230911-085640-root.json
- 08:52 urbanecm@deploy1002: Finished scap: Backport for Revert "Growth: Disable Add an image on all wikis" (T345188) (duration: 10m 27s)
- 08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52400 and previous config saved to /var/cache/conftool/dbconfig/20230911-085129-arnaudb.json
- 08:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
- 08:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
- 08:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P52399 and previous config saved to /var/cache/conftool/dbconfig/20230911-084804-arnaudb.json
- 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52398 and previous config saved to /var/cache/conftool/dbconfig/20230911-084647-root.json
- 08:46 urbanecm@deploy1002: urbanecm: Continuing with sync
- 08:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
- 08:44 urbanecm@deploy1002: urbanecm: Backport for Revert "Growth: Disable Add an image on all wikis" (T345188) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 08:42 urbanecm@deploy1002: Started scap: Backport for Revert "Growth: Disable Add an image on all wikis" (T345188)
- 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52397 and previous config saved to /var/cache/conftool/dbconfig/20230911-084135-root.json
- 08:37 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
- 08:37 claime: rebooting mwmaint2002.codfw.wmnet
- 08:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1001.eqiad.wmnet
- 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1119 with Debian Bookworm in s1 with just 1% T339185', diff saved to https://phabricator.wikimedia.org/P52396 and previous config saved to /var/cache/conftool/dbconfig/20230911-083346-marostegui.json
- 08:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52395 and previous config saved to /var/cache/conftool/dbconfig/20230911-083258-arnaudb.json
- 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52394 and previous config saved to /var/cache/conftool/dbconfig/20230911-083143-root.json
- 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52393 and previous config saved to /var/cache/conftool/dbconfig/20230911-082631-root.json
- 08:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwdebug1001.eqiad.wmnet
- 08:26 claime: rebooting mwdebug1001.eqiad.wmnet
- 08:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1002.eqiad.wmnet
- 08:20 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwdebug1002.eqiad.wmnet
- 08:20 claime: rebooting mwdebug1002.eqiad.wmnet
- 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52392 and previous config saved to /var/cache/conftool/dbconfig/20230911-081638-root.json
- 08:13 kostajh: UTC morning deploys done
- 08:13 kharlan@deploy1002: Finished scap: Backport for [beta] ReportIncident: Enable on kowiki beta (T339275), [beta] Enable ReportIncident for configured beta wikis (T339275), ReportIncident: Set default help page (T343382) (duration: 09m 44s)
- 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52391 and previous config saved to /var/cache/conftool/dbconfig/20230911-081126-root.json
- 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
- 08:07 kharlan@deploy1002: kharlan: Continuing with sync
- 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
- 08:05 kharlan@deploy1002: kharlan: Backport for [beta] ReportIncident: Enable on kowiki beta (T339275), [beta] Enable ReportIncident for configured beta wikis (T339275), ReportIncident: Set default help page (T343382) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deplo
- 08:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 08:03 kharlan@deploy1002: Started scap: Backport for [beta] ReportIncident: Enable on kowiki beta (T339275), [beta] Enable ReportIncident for configured beta wikis (T339275), ReportIncident: Set default help page (T343382)
- 08:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 08:02 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 08:02 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 08:02 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 08:01 filippo@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
- 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52390 and previous config saved to /var/cache/conftool/dbconfig/20230911-080133-root.json
- 08:00 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 08:00 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 08:00 kharlan@deploy1002: Finished scap: Backport for ReportIncident: Default deployment to false (T339275) (duration: 11m 15s)
- 08:00 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 08:00 filippo@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
- 07:59 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 07:59 filippo@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
- 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 3%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52389 and previous config saved to /var/cache/conftool/dbconfig/20230911-075621-root.json
- 07:53 kharlan@deploy1002: kharlan: Continuing with sync
- 07:50 kharlan@deploy1002: kharlan: Backport for ReportIncident: Default deployment to false (T339275) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:49 kharlan@deploy1002: Started scap: Backport for ReportIncident: Default deployment to false (T339275)
- 07:46 kharlan@deploy1002: Finished scap: Backport for Add ReportIncident extension (T339275) (duration: 22m 44s)
- 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52388 and previous config saved to /var/cache/conftool/dbconfig/20230911-074629-root.json
- 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52387 and previous config saved to /var/cache/conftool/dbconfig/20230911-074116-root.json
- 07:36 kharlan@deploy1002: kharlan: Continuing with sync
- 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
- 07:35 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 07:33 kharlan@deploy1002: kharlan: Backport for Add ReportIncident extension (T339275) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
- 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52386 and previous config saved to /var/cache/conftool/dbconfig/20230911-073124-root.json
- 07:23 kharlan@deploy1002: Started scap: Backport for Add ReportIncident extension (T339275)
- 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 3%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52385 and previous config saved to /var/cache/conftool/dbconfig/20230911-071619-root.json
- 07:11 kharlan@deploy1002: Started scap: Backport for Add ReportIncident extension (T339275)
- 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 1%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52384 and previous config saved to /var/cache/conftool/dbconfig/20230911-070114-root.json
- 06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
- 06:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136065
- 06:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136065
- 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1119 back to s1 depooled T339185', diff saved to https://phabricator.wikimedia.org/P52383 and previous config saved to /var/cache/conftool/dbconfig/20230911-054057-marostegui.json
- 05:00 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
- 04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P52382 and previous config saved to /var/cache/conftool/dbconfig/20230911-045907-root.json
- 01:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52381 and previous config saved to /var/cache/conftool/dbconfig/20230911-012911-arnaudb.json
- 01:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
- 01:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
- 01:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343198)', diff saved to https://phabricator.wikimedia.org/P52380 and previous config saved to /var/cache/conftool/dbconfig/20230911-012850-arnaudb.json
- 01:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P52379 and previous config saved to /var/cache/conftool/dbconfig/20230911-011343-arnaudb.json
- 00:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P52378 and previous config saved to /var/cache/conftool/dbconfig/20230911-005837-arnaudb.json
- 00:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343198)', diff saved to https://phabricator.wikimedia.org/P52377 and previous config saved to /var/cache/conftool/dbconfig/20230911-004331-arnaudb.json
2023-09-10
- 17:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343198)', diff saved to https://phabricator.wikimedia.org/P52375 and previous config saved to /var/cache/conftool/dbconfig/20230910-173502-arnaudb.json
- 17:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
- 17:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
- 11:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 11:19 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 11:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343198)', diff saved to https://phabricator.wikimedia.org/P52374 and previous config saved to /var/cache/conftool/dbconfig/20230910-111941-arnaudb.json
- 11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P52373 and previous config saved to /var/cache/conftool/dbconfig/20230910-110435-arnaudb.json
- 10:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P52372 and previous config saved to /var/cache/conftool/dbconfig/20230910-104929-arnaudb.json
- 10:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343198)', diff saved to https://phabricator.wikimedia.org/P52371 and previous config saved to /var/cache/conftool/dbconfig/20230910-103422-arnaudb.json
- 04:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T343198)', diff saved to https://phabricator.wikimedia.org/P52370 and previous config saved to /var/cache/conftool/dbconfig/20230910-042338-arnaudb.json
- 04:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 04:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 04:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343198)', diff saved to https://phabricator.wikimedia.org/P52369 and previous config saved to /var/cache/conftool/dbconfig/20230910-042317-arnaudb.json
- 04:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P52368 and previous config saved to /var/cache/conftool/dbconfig/20230910-040811-arnaudb.json
- 03:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P52367 and previous config saved to /var/cache/conftool/dbconfig/20230910-035304-arnaudb.json
- 03:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343198)', diff saved to https://phabricator.wikimedia.org/P52366 and previous config saved to /var/cache/conftool/dbconfig/20230910-033758-arnaudb.json
- 01:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T343198)', diff saved to https://phabricator.wikimedia.org/P52365 and previous config saved to /var/cache/conftool/dbconfig/20230910-013823-arnaudb.json
- 01:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 01:38 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 01:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 01:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 01:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343198)', diff saved to https://phabricator.wikimedia.org/P52364 and previous config saved to /var/cache/conftool/dbconfig/20230910-013745-arnaudb.json
- 01:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P52363 and previous config saved to /var/cache/conftool/dbconfig/20230910-012239-arnaudb.json
- 01:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P52362 and previous config saved to /var/cache/conftool/dbconfig/20230910-010733-arnaudb.json
- 00:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343198)', diff saved to https://phabricator.wikimedia.org/P52361 and previous config saved to /var/cache/conftool/dbconfig/20230910-005226-arnaudb.json
2023-09-09
- 20:16 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
- 19:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
- 19:35 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
- 19:14 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
- 18:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T343198)', diff saved to https://phabricator.wikimedia.org/P52360 and previous config saved to /var/cache/conftool/dbconfig/20230909-182802-arnaudb.json
- 18:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 18:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 18:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343198)', diff saved to https://phabricator.wikimedia.org/P52359 and previous config saved to /var/cache/conftool/dbconfig/20230909-182741-arnaudb.json
- 18:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P52358 and previous config saved to /var/cache/conftool/dbconfig/20230909-181234-arnaudb.json
- 17:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P52357 and previous config saved to /var/cache/conftool/dbconfig/20230909-175728-arnaudb.json
- 17:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343198)', diff saved to https://phabricator.wikimedia.org/P52356 and previous config saved to /var/cache/conftool/dbconfig/20230909-174222-arnaudb.json
- 17:35 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.codfw.wmnet with OS bookworm
- 16:54 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
- 16:51 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
- 16:33 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bookworm
- 16:27 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
- 15:44 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
- 15:41 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
- 15:22 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
- 11:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T343198)', diff saved to https://phabricator.wikimedia.org/P52355 and previous config saved to /var/cache/conftool/dbconfig/20230909-111508-arnaudb.json
- 11:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 11:14 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 11:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52354 and previous config saved to /var/cache/conftool/dbconfig/20230909-111447-arnaudb.json
- 10:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P52353 and previous config saved to /var/cache/conftool/dbconfig/20230909-105941-arnaudb.json
- 10:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P52352 and previous config saved to /var/cache/conftool/dbconfig/20230909-104434-arnaudb.json
- 10:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52351 and previous config saved to /var/cache/conftool/dbconfig/20230909-102928-arnaudb.json
- 04:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52350 and previous config saved to /var/cache/conftool/dbconfig/20230909-040947-arnaudb.json
- 04:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 04:09 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 04:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52349 and previous config saved to /var/cache/conftool/dbconfig/20230909-040925-arnaudb.json
- 03:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P52348 and previous config saved to /var/cache/conftool/dbconfig/20230909-035419-arnaudb.json
- 03:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P52347 and previous config saved to /var/cache/conftool/dbconfig/20230909-033913-arnaudb.json
- 03:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52346 and previous config saved to /var/cache/conftool/dbconfig/20230909-032407-arnaudb.json
- 02:19 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
- 01:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
- 01:35 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
- 01:18 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
2023-09-08
- 21:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1035.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1036.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1034.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1033.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1032.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1030.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1028.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1036.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1035.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1034.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1033.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1032.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1031.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1030.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1028.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1056
- 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1055
- 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1054
- 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1056
- 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1055
- 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1054
- 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1053
- 21:10 ejegg: civicrm upgraded from de883cd5 to 771fcde3
- 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1052
- 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1053
- 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1052
- 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1046
- 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1051
- 21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1050
- 21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1050
- 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1050
- 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1051
- 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1050
- 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1049
- 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1049
- 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1048
- 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1048
- 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1047
- 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1047
- 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1048
- 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1048
- 21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1047
- 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1047
- 21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1046
- 21:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52345 and previous config saved to /var/cache/conftool/dbconfig/20230908-210844-arnaudb.json
- 21:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 21:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 21:08 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1044
- 21:08 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1045
- 21:07 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1041
- 21:07 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1045
- 21:07 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1044
- 21:06 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1043
- 21:06 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1042
- 21:06 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1041
- 21:05 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1043
- 21:04 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1042
- 21:04 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1038
- 21:04 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1040
- 21:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1040
- 21:03 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1039
- 21:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1038
- 21:02 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1039
- 21:02 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1037
- 21:02 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1037
- 21:02 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host kubernetes1039
- 21:01 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1038
- 21:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1037
- 21:00 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1039
- 21:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1035
- 21:00 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1038
- 21:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1036
- 20:59 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1037
- 20:59 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1036
- 20:59 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1035
- 20:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1032
- 20:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1033
- 20:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1034
- 20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1034
- 20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1033
- 20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1031
- 20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1030
- 20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1032
- 20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1031
- 20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1028
- 20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1029
- 20:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1028
- 20:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1030
- 20:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1029
- 20:53 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:53 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
- 20:52 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
- 20:49 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 20:28 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
- 20:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
- 20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:24 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 17:24 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
- 17:20 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 17:19 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 17:13 taavi: reprepro copy bookworm-wikimedia bullseye-wikimedia prometheus-memcached-exporter # T345810
- 16:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 16:16 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:53 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1027
- 15:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1027
- 15:45 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:45 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1027 - jclark@cumin1001"
- 15:44 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1027 - jclark@cumin1001"
- 15:44 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
- 15:42 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 15:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
- 15:27 sukhe: running authdns-update for CR 955943
- 15:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['stat1011.eqiad.wmne']
- 15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['moss-be1003.eqiad.wmnet']
- 15:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be1003.eqiad.wmnet']
- 15:15 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-be1003.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:13 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 15:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1011.eqiad.wmne']
- 15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host stat1011.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 14:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 14:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host stat1011.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:43 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host stat1011
- 14:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T343198)', diff saved to https://phabricator.wikimedia.org/P52343 and previous config saved to /var/cache/conftool/dbconfig/20230908-144321-arnaudb.json
- 14:42 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host stat1011
- 14:42 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:42 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt stat1011 - jclark@cumin1001"
- 14:41 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt stat1011 - jclark@cumin1001"
- 14:39 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 14:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P52342 and previous config saved to /var/cache/conftool/dbconfig/20230908-142815-arnaudb.json
- 14:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host moss-be1003.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host moss-be1003
- 14:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host moss-be1003
- 14:27 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt moss-be1003 - jclark@cumin1001"
- 14:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt moss-be1003 - jclark@cumin1001"
- 14:24 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 14:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P52341 and previous config saved to /var/cache/conftool/dbconfig/20230908-141309-arnaudb.json
- 13:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T343198)', diff saved to https://phabricator.wikimedia.org/P52340 and previous config saved to /var/cache/conftool/dbconfig/20230908-135803-arnaudb.json
- 13:39 isaranto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
- 13:39 isaranto@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
- 13:39 isaranto@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
- 13:38 isaranto@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
- 13:37 isaranto@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
- 13:37 isaranto@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
- 13:34 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:34 kevinbazira@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:24 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 13:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 13:05 isaranto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
- 13:05 isaranto@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
- 13:01 isaranto@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
- 13:01 isaranto@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
- 13:00 isaranto@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
- 12:59 isaranto@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
- 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1006.wikimedia.org
- 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-replica1006.wikimedia.org with OS bookworm
- 12:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:51 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica1006.wikimedia.org with reason: host reimage
- 12:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica1006.wikimedia.org with reason: host reimage
- 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1005.wikimedia.org
- 12:31 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1005.wikimedia.org
- 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica1006.wikimedia.org with OS bookworm
- 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
- 12:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
- 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1006.wikimedia.org on all recursors
- 12:23 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1006.wikimedia.org on all recursors
- 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
- 12:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
- 12:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1006.wikimedia.org
- 12:18 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica1006.wikimedia.org
- 12:17 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 12:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1006.wikimedia.org
- 12:05 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica1006.wikimedia.org
- 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1006.wikimedia.org on all recursors
- 12:05 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1006.wikimedia.org on all recursors
- 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
- 12:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
- 11:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1006.wikimedia.org on all recursors
- 11:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1006.wikimedia.org on all recursors
- 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
- 11:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
- 11:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:55 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1006.wikimedia.org
- 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1005.wikimedia.org
- 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-replica1005.wikimedia.org with OS bookworm
- 11:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T343198)', diff saved to https://phabricator.wikimedia.org/P52337 and previous config saved to /var/cache/conftool/dbconfig/20230908-114911-arnaudb.json
- 11:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 11:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 11:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52336 and previous config saved to /var/cache/conftool/dbconfig/20230908-114850-arnaudb.json
- 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica1005.wikimedia.org with reason: host reimage
- 11:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica1005.wikimedia.org with reason: host reimage
- 11:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P52335 and previous config saved to /var/cache/conftool/dbconfig/20230908-113344-arnaudb.json
- 11:23 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica1005.wikimedia.org with OS bookworm
- 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
- 11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
- 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1005.wikimedia.org on all recursors
- 11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1005.wikimedia.org on all recursors
- 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
- 11:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
- 11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P52334 and previous config saved to /var/cache/conftool/dbconfig/20230908-111838-arnaudb.json
- 11:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:17 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1005.wikimedia.org
- 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
- 11:14 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
- 11:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 11:07 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-rw2001.wikimedia.org with OS bookworm
- 11:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 11:04 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 11:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52333 and previous config saved to /var/cache/conftool/dbconfig/20230908-110331-arnaudb.json
- 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
- 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
- 10:33 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-rw2001.wikimedia.org with OS bookworm
- 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-rw1001.wikimedia.org with OS bookworm
- 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
- 10:07 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
- 10:05 jbond@cumin1001: START - Cookbook sre.dns.netbox
- 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-rw1001.wikimedia.org with OS bookworm
- 09:46 vgutierrez: restart fifo-log-demux@notpurge.service in cp4052
- 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
- 09:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
- 09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts furud.codfw.wmnet
- 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: furud.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 09:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: furud.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 09:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:22 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts furud.codfw.wmnet
- 09:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 09:16 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 09:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 09:13 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 09:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:11 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
- 09:10 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:00 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
- 08:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
- 08:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
- 08:09 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 08:06 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 08:01 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 07:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52328 and previous config saved to /var/cache/conftool/dbconfig/20230908-075901-arnaudb.json
- 07:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 07:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 07:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T343198)', diff saved to https://phabricator.wikimedia.org/P52327 and previous config saved to /var/cache/conftool/dbconfig/20230908-075840-arnaudb.json
- 07:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P52326 and previous config saved to /var/cache/conftool/dbconfig/20230908-074334-arnaudb.json
- 07:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P52325 and previous config saved to /var/cache/conftool/dbconfig/20230908-072828-arnaudb.json
- 07:27 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:26 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:26 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:25 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:25 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 07:25 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 07:24 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 07:24 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 07:24 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 07:23 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 07:23 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 07:23 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 07:22 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 07:22 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 07:22 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 07:21 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 07:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 07:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 07:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 07:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 07:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T343198)', diff saved to https://phabricator.wikimedia.org/P52324 and previous config saved to /var/cache/conftool/dbconfig/20230908-071322-arnaudb.json
- 07:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 05:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 05:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
- 04:57 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
- 04:54 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
- 04:29 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
- 04:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T343198)', diff saved to https://phabricator.wikimedia.org/P52323 and previous config saved to /var/cache/conftool/dbconfig/20230908-042821-arnaudb.json
- 04:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 04:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 04:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52322 and previous config saved to /var/cache/conftool/dbconfig/20230908-042800-arnaudb.json
- 04:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P52321 and previous config saved to /var/cache/conftool/dbconfig/20230908-041254-arnaudb.json
- 03:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P52320 and previous config saved to /var/cache/conftool/dbconfig/20230908-035747-arnaudb.json
- 03:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52319 and previous config saved to /var/cache/conftool/dbconfig/20230908-034241-arnaudb.json
- 00:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52318 and previous config saved to /var/cache/conftool/dbconfig/20230908-005323-arnaudb.json
- 00:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
- 00:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
- 00:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T343198)', diff saved to https://phabricator.wikimedia.org/P52317 and previous config saved to /var/cache/conftool/dbconfig/20230908-005301-arnaudb.json
- 00:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P52316 and previous config saved to /var/cache/conftool/dbconfig/20230908-003755-arnaudb.json
- 00:23 eileen: civicrm upgraded from e81ed4e9 to de883cd5
- 00:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P52315 and previous config saved to /var/cache/conftool/dbconfig/20230908-002248-arnaudb.json
- 00:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T343198)', diff saved to https://phabricator.wikimedia.org/P52314 and previous config saved to /var/cache/conftool/dbconfig/20230908-000742-arnaudb.json
- 00:03 eileen: civicrm upgraded from 5a432b1e to e81ed4e9
2023-09-07
- 23:12 ejegg: payments-wiki upgraded from 639a8d6a to c524f53f
- 22:45 jhuneidi@deploy1002: Installation of scap version "4.59.0" completed for 594 hosts
- 22:44 jhuneidi@deploy1002: Installing scap version "4.59.0" for 594 hosts
- 22:30 jhuneidi@deploy1002: Installing scap version "4.59.0" for 595 hosts
- 22:29 jeena: installing scap v4.59.0
- 22:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 21:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T343198)', diff saved to https://phabricator.wikimedia.org/P52313 and previous config saved to /var/cache/conftool/dbconfig/20230907-214717-arnaudb.json
- 21:47 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 21:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 21:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
- 21:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
- 21:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T343198)', diff saved to https://phabricator.wikimedia.org/P52312 and previous config saved to /var/cache/conftool/dbconfig/20230907-214640-arnaudb.json
- 21:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P52311 and previous config saved to /var/cache/conftool/dbconfig/20230907-213134-arnaudb.json
- 21:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P52310 and previous config saved to /var/cache/conftool/dbconfig/20230907-211628-arnaudb.json
- 21:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T343198)', diff saved to https://phabricator.wikimedia.org/P52309 and previous config saved to /var/cache/conftool/dbconfig/20230907-210122-arnaudb.json
- 20:56 thcipriani@deploy1002: Finished scap: Backport for Preserve Gadget prefs when they can't be enabled (T341421), Fix settings button not working on reference previews (T345829) (duration: 11m 12s)
- 20:50 thcipriani@deploy1002: jdlrobson and thcipriani: Continuing with sync
- 20:49 taavi@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2444.codfw.wmnet
- 20:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 20:46 thcipriani@deploy1002: jdlrobson and thcipriani: Backport for Preserve Gadget prefs when they can't be enabled (T341421), Fix settings button not working on reference previews (T345829) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD o
- 20:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
- 20:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 20:45 thcipriani@deploy1002: Started scap: Backport for Preserve Gadget prefs when they can't be enabled (T341421), Fix settings button not working on reference previews (T345829)
- 20:41 thcipriani@deploy1002: Finished scap: Backport for Pre-deploy Reader Demographics 2 pilot survey (T344393) (duration: 10m 59s)
- 20:33 thcipriani@deploy1002: dani and thcipriani: Continuing with sync
- 20:31 thcipriani@deploy1002: dani and thcipriani: Backport for Pre-deploy Reader Demographics 2 pilot survey (T344393) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:30 thcipriani@deploy1002: Started scap: Backport for Pre-deploy Reader Demographics 2 pilot survey (T344393)
- 20:23 thcipriani@deploy1002: Finished scap: Backport for Undeploy Campaigns Event Discovery survey (T345158) (duration: 17m 58s)
- 20:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
- 20:11 thcipriani@deploy1002: thcipriani and dani: Continuing with sync
- 20:07 thcipriani@deploy1002: thcipriani and dani: Backport for Undeploy Campaigns Event Discovery survey (T345158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:05 thcipriani@deploy1002: Started scap: Backport for Undeploy Campaigns Event Discovery survey (T345158)
- 19:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
- 19:38 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
- 19:37 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
- 19:33 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
- 19:13 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
- 18:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: T342361
- 18:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: T342361
- 18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T343198)', diff saved to https://phabricator.wikimedia.org/P52308 and previous config saved to /var/cache/conftool/dbconfig/20230907-183153-arnaudb.json
- 18:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
- 18:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
- 18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52307 and previous config saved to /var/cache/conftool/dbconfig/20230907-183132-arnaudb.json
- 18:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52306 and previous config saved to /var/cache/conftool/dbconfig/20230907-181626-arnaudb.json
- 18:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52305 and previous config saved to /var/cache/conftool/dbconfig/20230907-180120-arnaudb.json
- 17:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52304 and previous config saved to /var/cache/conftool/dbconfig/20230907-174613-arnaudb.json
- 17:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52303 and previous config saved to /var/cache/conftool/dbconfig/20230907-174351-arnaudb.json
- 17:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
- 17:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
- 16:45 Amir1: running moveToExternal on all wikis
- 15:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004.eqiad.wmnet']
- 15:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004.eqiad.wmnet']
- 15:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
- 15:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
- 15:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
- 15:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
- 15:37 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 15:33 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lists1004
- 15:32 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lists1004
- 15:28 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 15:20 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:19 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:13 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:13 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:11 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 15:11 filippo@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 14:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
- 14:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
- 14:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
- 14:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
- 14:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
- 14:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
- 14:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1136.eqiad.wmnet with OS bullseye
- 14:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
- 14:30 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
- 14:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
- 14:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
- 14:30 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
- 14:30 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
- 14:28 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:27 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 14:27 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 14:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbstore1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbstore1008.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:24 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:24 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 14:23 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 14:23 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 14:22 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:20 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 14:19 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1135.eqiad.wmnet with OS bullseye
- 14:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
- 14:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
- 14:15 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
- 14:14 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
- 14:13 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1136.eqiad.wmnet with reason: host reimage
- 14:12 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
- 14:11 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
- 14:10 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
- 14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
- 14:10 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
- 14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
- 14:10 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1136.eqiad.wmnet with reason: host reimage
- 14:03 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:02 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 13:58 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 13:58 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 13:58 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 13:58 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 13:58 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 13:57 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 13:56 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1135.eqiad.wmnet with reason: host reimage
- 13:56 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1136.eqiad.wmnet with OS bullseye
- 13:53 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1135.eqiad.wmnet with reason: host reimage
- 13:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbstore1009.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbstore1008.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:51 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbstore1009
- 13:51 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbstore1008
- 13:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbstore1009
- 13:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbstore1008
- 13:50 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbstore100{8..9} - jclark@cumin1001"
- 13:50 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbstore100{8..9} - jclark@cumin1001"
- 13:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 13:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
- 13:40 XioNoX: trunk sandbox vlan to ganeti nodes in esams BY27 - T307021
- 13:40 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1135.eqiad.wmnet with OS bullseye
- 13:38 taavi: taavi@mwmaint1002 ~ $ mwscript extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php --wiki=labswiki | tee oathauth-multiple-labswiki.log # T242031
- 13:38 taavi@deploy1002: Finished scap: Backport for Set OATHAuth multiple devices READ_NEW for all fishbows, privates (T242031), Set OATHAuth multiple devices WRITE_BOTH for wikitech (T242031) (duration: 08m 52s)
- 13:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
- 13:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki1002.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:31 taavi@deploy1002: taavi: Continuing with sync
- 13:30 taavi@deploy1002: taavi: Backport for Set OATHAuth multiple devices READ_NEW for all fishbows, privates (T242031), Set OATHAuth multiple devices WRITE_BOTH for wikitech (T242031) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:29 taavi@deploy1002: Started scap: Backport for Set OATHAuth multiple devices READ_NEW for all fishbows, privates (T242031), Set OATHAuth multiple devices WRITE_BOTH for wikitech (T242031)
- 13:27 taavi@deploy1002: Finished scap: Backport for Edit check: Turn on when ecenable=1 is set (T345297) (duration: 09m 46s)
- 13:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pki1002.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:22 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pki1002
- 13:21 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pki1002
- 13:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:21 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pki1002 - jclark@cumin1001"
- 13:20 taavi@deploy1002: taavi and kemayo: Continuing with sync
- 13:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pki1002 - jclark@cumin1001"
- 13:18 taavi@deploy1002: taavi and kemayo: Backport for Edit check: Turn on when ecenable=1 is set (T345297) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:18 jclark@cumin1001: START - Cookbook sre.dns.netbox
- 13:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts atlas2001.wikimedia.org
- 13:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
- 13:17 taavi@deploy1002: Started scap: Backport for Edit check: Turn on when ecenable=1 is set (T345297)
- 13:14 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
- 13:12 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
- 13:08 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts atlas2001.wikimedia.org
- 12:35 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
- 12:34 filippo@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
- 12:23 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 12:23 filippo@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 12:04 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 12:04 claime: Starting eqiad jobrunner reboots
- 12:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 12:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
- 11:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 11:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
- 11:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
- 11:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
- 11:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
- 11:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
- 11:10 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:09 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 11:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
- 11:04 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
- 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
- 10:56 urbanecm: mwmaint1002: `/usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue` (T344428, testing with r955319 deployed)
- 10:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
- 10:54 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
- 10:51 ladsgroup@deploy1002: Finished scap: Backport for Pin pagelinks normalization stage to old in production (T345732) (duration: 09m 05s)
- 10:46 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 10:45 ladsgroup@deploy1002: ladsgroup: Continuing with sync
- 10:44 ladsgroup@deploy1002: ladsgroup: Backport for Pin pagelinks normalization stage to old in production (T345732) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 10:42 ladsgroup@deploy1002: Started scap: Backport for Pin pagelinks normalization stage to old in production (T345732)
- 10:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1441-1442,1451].eqiad.wmnet
- 10:35 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 10:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1441-1442,1451].eqiad.wmnet
- 10:35 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 10:33 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
- 10:32 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
- 10:29 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
- 10:24 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 10:24 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 10:23 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 10:23 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 10:21 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 10:10 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
- 10:08 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetserver1002.eqiad.wmnet with OS bookworm
- 10:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.25 refs T343727
- 10:03 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1134.eqiad.wmnet with OS bullseye
- 09:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1002.eqiad.wmnet with reason: host reimage
- 09:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1002.eqiad.wmnet with reason: host reimage
- 09:54 hashar@deploy1002: Finished scap: Backport for RevisionReviewForm: allow setting `null` tag (T345804) (duration: 07m 54s)
- 09:48 hashar@deploy1002: ladsgroup and hashar: Continuing with sync
- 09:47 hashar@deploy1002: ladsgroup and hashar: Backport for RevisionReviewForm: allow setting `null` tag (T345804) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 09:46 hashar@deploy1002: Started scap: Backport for RevisionReviewForm: allow setting `null` tag (T345804)
- 09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1134.eqiad.wmnet with reason: host reimage
- 09:39 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1002.eqiad.wmnet with OS bookworm
- 09:38 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1133.eqiad.wmnet with OS bullseye
- 09:38 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1134.eqiad.wmnet with reason: host reimage
- 09:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 09:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52300 and previous config saved to /var/cache/conftool/dbconfig/20230907-093718-arnaudb.json
- 09:24 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1134.eqiad.wmnet with OS bullseye
- 09:22 moritzm: installing grub2 updates from Bullseye point release
- 09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P52299 and previous config saved to /var/cache/conftool/dbconfig/20230907-092212-arnaudb.json
- 09:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1133.eqiad.wmnet with reason: host reimage
- 09:14 taavi: foreachwikiindblist private extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php | tee oathauth-multiple-private.log # T242031
- 09:12 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1133.eqiad.wmnet with reason: host reimage
- 09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P52298 and previous config saved to /var/cache/conftool/dbconfig/20230907-090706-arnaudb.json
- 08:59 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1133.eqiad.wmnet with OS bullseye
- 08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52297 and previous config saved to /var/cache/conftool/dbconfig/20230907-085159-arnaudb.json
- 08:51 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 08:46 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.41.0-wmf.24 - T343727
- 08:38 moritzm: installing librsvg security updates
- 08:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mc2040.codfw.wmnet with reason: T345802 - hw troubleshooting
- 08:23 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mc2040.codfw.wmnet with reason: T345802 - hw troubleshooting
- 08:22 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.25 refs T343727
- 07:57 moritzm: installing grub2 updates from Bullseye point release
- 07:40 moritzm: installing file/libmagic security updates
- 07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
- 07:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
- 07:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
- 07:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
- 06:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
- 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
- 06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52296 and previous config saved to /var/cache/conftool/dbconfig/20230907-062900-arnaudb.json
- 06:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 06:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs[1003-1004].eqiad.wmnet with reason: reboot
- 06:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs[1003-1004].eqiad.wmnet with reason: reboot
- 06:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 06:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T343198)', diff saved to https://phabricator.wikimedia.org/P52295 and previous config saved to /var/cache/conftool/dbconfig/20230907-062838-arnaudb.json
- 06:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P52294 and previous config saved to /var/cache/conftool/dbconfig/20230907-061332-arnaudb.json
- 05:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P52293 and previous config saved to /var/cache/conftool/dbconfig/20230907-055826-arnaudb.json
- 05:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T343198)', diff saved to https://phabricator.wikimedia.org/P52292 and previous config saved to /var/cache/conftool/dbconfig/20230907-054320-arnaudb.json
- 05:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 05:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 03:40 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 03:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 03:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
- 03:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
- 03:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T343198)', diff saved to https://phabricator.wikimedia.org/P52291 and previous config saved to /var/cache/conftool/dbconfig/20230907-032306-arnaudb.json
- 03:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 03:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 03:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T343198)', diff saved to https://phabricator.wikimedia.org/P52290 and previous config saved to /var/cache/conftool/dbconfig/20230907-032245-arnaudb.json
- 03:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P52289 and previous config saved to /var/cache/conftool/dbconfig/20230907-030739-arnaudb.json
- 02:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P52288 and previous config saved to /var/cache/conftool/dbconfig/20230907-025233-arnaudb.json
- 02:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T343198)', diff saved to https://phabricator.wikimedia.org/P52287 and previous config saved to /var/cache/conftool/dbconfig/20230907-023727-arnaudb.json
- 01:10 tstarling@deploy1002: Synchronized php-1.41.0-wmf.25/extensions/Phonos/extension.json: fix breakage of Phonos on parser-cached pages T345414 (duration: 06m 59s)
- 00:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T343198)', diff saved to https://phabricator.wikimedia.org/P52286 and previous config saved to /var/cache/conftool/dbconfig/20230907-003038-arnaudb.json
- 00:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 00:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 00:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T343198)', diff saved to https://phabricator.wikimedia.org/P52285 and previous config saved to /var/cache/conftool/dbconfig/20230907-003017-arnaudb.json
- 00:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P52284 and previous config saved to /var/cache/conftool/dbconfig/20230907-001510-arnaudb.json
- 00:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P52283 and previous config saved to /var/cache/conftool/dbconfig/20230907-000004-arnaudb.json
2023-09-06
- 23:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T343198)', diff saved to https://phabricator.wikimedia.org/P52282 and previous config saved to /var/cache/conftool/dbconfig/20230906-234458-arnaudb.json
- 22:10 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host flink-zk2003.codfw.wmnet
- 22:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk2003.codfw.wmnet with OS bookworm
- 21:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk2003.codfw.wmnet with reason: host reimage
- 21:53 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk2003.codfw.wmnet with reason: host reimage
- 21:44 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
- 21:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T343198)', diff saved to https://phabricator.wikimedia.org/P52281 and previous config saved to /var/cache/conftool/dbconfig/20230906-214205-arnaudb.json
- 21:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 21:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 21:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T343198)', diff saved to https://phabricator.wikimedia.org/P52280 and previous config saved to /var/cache/conftool/dbconfig/20230906-214145-arnaudb.json
- 21:40 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1007.eqiad.wmnet with OS bullseye
- 21:39 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
- 21:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1006.eqiad.wmnet with OS bullseye
- 21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2003.codfw.wmnet with OS bookworm
- 21:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P52279 and previous config saved to /var/cache/conftool/dbconfig/20230906-212638-arnaudb.json
- 21:23 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2003.codfw.wmnet - bking@cumin1001"
- 21:22 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2003.codfw.wmnet - bking@cumin1001"
- 21:22 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2003.codfw.wmnet on all recursors
- 21:22 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2003.codfw.wmnet on all recursors
- 21:22 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:22 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2003.codfw.wmnet - bking@cumin1001"
- 21:21 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2003.codfw.wmnet - bking@cumin1001"
- 21:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
- 21:18 bking@cumin1001: START - Cookbook sre.dns.netbox
- 21:18 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2003.codfw.wmnet
- 21:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: host reimage
- 21:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: host reimage
- 21:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P52278 and previous config saved to /var/cache/conftool/dbconfig/20230906-211132-arnaudb.json
- 21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: host reimage
- 21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: host reimage
- 20:58 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1007.eqiad.wmnet with OS bullseye
- 20:58 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1006.eqiad.wmnet with OS bullseye
- 20:56 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host flink-zk2002.codfw.wmnet
- 20:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk2002.codfw.wmnet with OS bookworm
- 20:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T343198)', diff saved to https://phabricator.wikimedia.org/P52277 and previous config saved to /var/cache/conftool/dbconfig/20230906-205626-arnaudb.json
- 20:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
- 20:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk2002.codfw.wmnet with reason: host reimage
- 20:40 taavi@deploy1002: Finished scap: Backport for Article: Check permissions before showing link to view deleted revision (T264765), Article: Check permissions before showing link to view deleted revision (T264765), TopicSubscriptionsPager: Handle invalid titles (T345648), TopicSubscriptionsPager: Handle invalid titles (T345648) (duration: 09m 42s)
- 20:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk2002.codfw.wmnet with reason: host reimage
- 20:34 taavi@deploy1002: matmarex and taavi: Continuing with sync
- 20:32 taavi@deploy1002: matmarex and taavi: Backport for Article: Check permissions before showing link to view deleted revision (T264765), Article: Check permissions before showing link to view deleted revision (T264765), TopicSubscriptionsPager: Handle invalid titles (T345648), TopicSubscriptionsPager: Handle invalid titles (T345648) synced to the tes
- 20:30 taavi@deploy1002: Started scap: Backport for Article: Check permissions before showing link to view deleted revision (T264765), Article: Check permissions before showing link to view deleted revision (T264765), TopicSubscriptionsPager: Handle invalid titles (T345648), TopicSubscriptionsPager: Handle invalid titles (T345648)
- 20:30 taavi@deploy1002: Finished scap: Backport for Add wikispecies logo (T341252), Disable wordmark on Gothic Wikipedia (T341253), Wikimania logos and taglines (T341254) (duration: 14m 25s)
- 20:24 taavi@deploy1002: jdlrobson and taavi: Continuing with sync
- 20:17 taavi@deploy1002: jdlrobson and taavi: Backport for Add wikispecies logo (T341252), Disable wordmark on Gothic Wikipedia (T341253), Wikimania logos and taglines (T341254) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XW
- 20:15 taavi@deploy1002: Started scap: Backport for Add wikispecies logo (T341252), Disable wordmark on Gothic Wikipedia (T341253), Wikimania logos and taglines (T341254)
- 20:14 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2002.codfw.wmnet with OS bookworm
- 20:14 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2002.codfw.wmnet - bking@cumin1001"
- 20:14 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2002.codfw.wmnet - bking@cumin1001"
- 20:13 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2002.codfw.wmnet on all recursors
- 20:13 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2002.codfw.wmnet on all recursors
- 20:13 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:13 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2002.codfw.wmnet - bking@cumin1001"
- 20:13 taavi@deploy1002: Finished scap: Backport for GrowthExperiments: enable add a link in 12 and 13th round of wikis (T308137 T308138) (duration: 10m 16s)
- 20:12 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2002.codfw.wmnet - bking@cumin1001"
- 20:10 bking@cumin1001: START - Cookbook sre.dns.netbox
- 20:10 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2002.codfw.wmnet
- 20:07 taavi@deploy1002: taavi and sgimeno: Continuing with sync
- 20:04 taavi@deploy1002: taavi and sgimeno: Backport for GrowthExperiments: enable add a link in 12 and 13th round of wikis (T308137 T308138) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:03 taavi@deploy1002: Started scap: Backport for GrowthExperiments: enable add a link in 12 and 13th round of wikis (T308137 T308138)
- 19:18 hmonroy@deploy1002: Finished scap: Backport for Delay loading ext.phonos module until user clicks (T345414) (duration: 07m 58s)
- 19:12 hmonroy@deploy1002: hmonroy and musikanimal: Continuing with sync
- 19:12 hmonroy@deploy1002: hmonroy and musikanimal: Backport for Delay loading ext.phonos module until user clicks (T345414) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 19:10 hmonroy@deploy1002: Started scap: Backport for Delay loading ext.phonos module until user clicks (T345414)
- 18:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T343198)', diff saved to https://phabricator.wikimedia.org/P52276 and previous config saved to /var/cache/conftool/dbconfig/20230906-181602-arnaudb.json
- 18:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 18:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 18:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 18:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 18:00 cmooney@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030']
- 18:00 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030']
- 18:00 cmooney@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['restbase1030']
- 18:00 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030']
- 17:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase1030']
- 17:58 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030']
- 17:55 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
- 17:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1132.eqiad.wmnet with OS bullseye
- 17:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
- 17:22 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
- 17:05 brett: Upload libvmod-re2_1.5.3-5_amd64 to bookworm-wikimedia
- 16:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
- 16:43 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
- 16:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:42 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove entries for cloudweb2002-dev - cmooney@cumin1001"
- 16:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS bullseye
- 16:25 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove entries for cloudweb2002-dev - cmooney@cumin1001"
- 16:14 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
- 15:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1387.eqiad.wmnet
- 15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1385.eqiad.wmnet
- 15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1373.eqiad.wmnet
- 15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1364.eqiad.wmnet
- 15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1384.eqiad.wmnet
- 15:41 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
- 15:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
- 15:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
- 15:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52275 and previous config saved to /var/cache/conftool/dbconfig/20230906-153957-arnaudb.json
- 15:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
- 15:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
- 15:38 akosiaris: sudo ethtool -G eno1 rx 1000 on conf2005, conf2006 to test out the theory. T345738
- 15:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
- 15:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
- 15:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P52274 and previous config saved to /var/cache/conftool/dbconfig/20230906-152451-arnaudb.json
- 15:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P52273 and previous config saved to /var/cache/conftool/dbconfig/20230906-150945-arnaudb.json
- 15:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['moss-be2003']
- 15:06 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
- 15:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
- 15:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
- 14:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
- 14:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
- 14:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52272 and previous config saved to /var/cache/conftool/dbconfig/20230906-145439-arnaudb.json
- 14:52 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
- 14:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
- 14:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-be2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 14:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 14:31 claime: Repooling mw1349.eqiad.wmnet - T345741
- 14:22 claime: Leaving mw1349.eqiad.wmnet pooled=invalid until management interface investigation - T345741
- 14:18 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 14:18 claime: Restarting appserver reboots
- 13:59 claime: repooling mw1351.eqiad.wmnet
- 13:57 claime: powercycling mw1349.eqiad.wmnet
- 13:54 claime: powercycling mw1351.eqiad.wmnet
- 13:53 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1351.eqiad.wmnet
- 13:53 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1349.eqiad.wmnet
- 13:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
- 13:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
- 13:38 akosiaris: sudo ethtool -G eno1 rx 1000 on conf2004 T345738
- 13:38 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
- 13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2047.codfw.wmnet
- 13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
- 13:33 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
- 13:21 sukhe: homer "asw1-b*27-esams*" commit "add durum300[34]"
- 13:21 taavi: taavi@mwmaint1002 ~ $ cat logos-to-purge.txt | mwscript purgeList.php --wiki enwiki # T345666
- 13:21 taavi@deploy1002: Finished scap: Backport for bnwikisource: update legacy vector logo (T345666) (duration: 17m 35s)
- 13:20 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
- 13:20 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
- 13:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2001.codfw.wmnet on all recursors
- 13:19 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
- 13:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:19 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
- 13:18 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
- 13:16 bking@cumin1001: START - Cookbook sre.dns.netbox
- 13:16 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
- 13:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
- 13:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
- 13:07 taavi@deploy1002: taavi and anzx: Continuing with sync
- 13:05 taavi@deploy1002: taavi and anzx: Backport for bnwikisource: update legacy vector logo (T345666) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:03 taavi@deploy1002: Started scap: Backport for bnwikisource: update legacy vector logo (T345666)
- 12:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:11 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 12:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 12:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
- 12:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
- 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52270 and previous config saved to /var/cache/conftool/dbconfig/20230906-120448-arnaudb.json
- 12:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
- 12:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
- 12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52269 and previous config saved to /var/cache/conftool/dbconfig/20230906-120427-arnaudb.json
- 12:03 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
- 12:03 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
- 11:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P52268 and previous config saved to /var/cache/conftool/dbconfig/20230906-114921-arnaudb.json
- 11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P52267 and previous config saved to /var/cache/conftool/dbconfig/20230906-113414-arnaudb.json
- 11:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1045.eqiad.wmnet
- 11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
- 11:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
- 11:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1045.eqiad.wmnet
- 11:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52266 and previous config saved to /var/cache/conftool/dbconfig/20230906-111908-arnaudb.json
- 11:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
- 11:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
- 10:58 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
- 10:58 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
- 10:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
- 10:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1043.eqiad.wmnet
- 10:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
- 10:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1043.eqiad.wmnet
- 10:27 topranks: Resetting PIC 1/1 on cr2-codfw to enable et-1/1/5 at 100G (T345583)
- 10:15 topranks: shut cr2-codfw xe-1/1/1:3 interface to cr1-codfw ahead of card 1/1 reset (T345583)
- 10:08 topranks: Draining cr2-codfw transport cct's to eqdfw and eqiad prior to card 1/1 reset (T345583)
- 09:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 09:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 09:57 topranks: de-activating peering sessions at DE-CIX Dallas on cr2-codfw prior to card 1/1 reset (T345583)
- 09:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 09:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 09:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ganeti-test01.svc.eqiad.wmnet on all recursors
- 09:51 ayounsi@cumin1001: START - Cookbook sre.dns.wipe-cache ganeti-test01.svc.eqiad.wmnet on all recursors
- 09:49 topranks: Making cr1-codfw VRRP primary for connections to row C and D prior to card 1/1 reset (T345583)
- 09:49 jbond: enable puppet post switch puppetdbs gerrit:954622
- 09:28 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 09:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 09:26 jbond: disable puppet to switch puppetdbs gerrit:954622
- 09:23 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
- 09:23 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
- 09:23 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
- 09:23 topranks: Resetting PIC 1/1 on cr1-codfw to enable port et-1/1/5 at 100G (T345583)
- 09:23 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
- 09:15 topranks: Shutting cr1-codfw port xe-1/1/1:1 to cr2-codfw before card 1/1 reset (T345583)
- 09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52265 and previous config saved to /var/cache/conftool/dbconfig/20230906-090541-arnaudb.json
- 09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 09:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
- 09:05 topranks: Draining transport circuits landing on cr1-codfw card 1/1 prior to reset (T345583)
- 08:58 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
- 08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
- 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
- 08:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
- 08:25 hashar@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.25 refs T343727 (duration: 06m 31s)
- 08:18 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.25 refs T343727
- 07:51 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 07:51 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 07:21 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service in more wikis - rollout #2 (T341445) (duration: 11m 05s)
- 07:15 kartik@deploy1002: abi and kartik: Continuing with sync
- 07:11 kartik@deploy1002: abi and kartik: Backport for Enable MinT translation service in more wikis - rollout #2 (T341445) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:10 kartik@deploy1002: Started scap: Backport for Enable MinT translation service in more wikis - rollout #2 (T341445)
- 05:28 tstarling@deploy1002: Synchronized php-1.41.0-wmf.25/extensions/Phonos: Fix UBN client-side error from malformed Phonos tags T345672 (duration: 06m 51s)
- 04:07 eileen: civicrm upgraded from a6fd7d6b to 5a432b1e
2023-09-05
- 23:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk2001.codfw.wmnet
- 23:44 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:44 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
- 23:37 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
- 23:34 bking@cumin1001: START - Cookbook sre.dns.netbox
- 23:30 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk2001.codfw.wmnet
- 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update DNS entries for kubernetes2029 and 2030 - pt1979@cumin2002"
- 22:57 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update DNS entries for kubernetes2029 and 2030 - pt1979@cumin2002"
- 22:55 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 22:22 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
- 22:11 urbanecm: mwmaint1002: `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --batch-size=20 --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=6hour --verbose --use-job-queue ` (debugging T344428, lowered batch size [100 -> 20])
- 21:38 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.25 refs T343727
- 21:38 urbanecm: mwmaint1002: `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=6hour --verbose --use-job-queue` (trying to reproduce T344428)
- 21:34 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
- 21:28 sbassett: Deployed updated security mitigation for T336027
- 21:28 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
- 21:21 cjming@deploy1002: Finished scap: Backport for Fix unseen notifications icon (T345483) (duration: 13m 46s)
- 21:16 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
- 21:16 cjming: end of UTC late backport window
- 21:15 cjming@deploy1002: jdlrobson and cjming: Continuing with sync
- 21:12 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
- 21:09 cjming@deploy1002: jdlrobson and cjming: Backport for Fix unseen notifications icon (T345483) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 21:09 eileen: config revision changed from c2f91f49 to e1c3b7fd
- 21:08 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
- 21:07 cjming@deploy1002: Started scap: Backport for Fix unseen notifications icon (T345483)
- 20:49 cjming@deploy1002: Finished scap: Backport for Fix unseen notifications icon (T345483) (duration: 16m 45s)
- 20:43 cjming@deploy1002: cjming and jdlrobson: Continuing with sync
- 20:34 cjming@deploy1002: cjming and jdlrobson: Backport for Fix unseen notifications icon (T345483) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:33 cjming@deploy1002: Started scap: Backport for Fix unseen notifications icon (T345483)
- 20:32 cjming@deploy1002: Finished scap: Backport for Fix temp user popup appearing on every new page creation (T345569) (duration: 11m 37s)
- 20:26 cjming@deploy1002: cjming and matmarex: Continuing with sync
- 20:22 cjming@deploy1002: cjming and matmarex: Backport for Fix temp user popup appearing on every new page creation (T345569) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:20 cjming@deploy1002: Started scap: Backport for Fix temp user popup appearing on every new page creation (T345569)
- 20:17 cjming@deploy1002: Finished scap: Backport for Deploy Campaigns Event Discovery survey (T345158) (duration: 10m 27s)
- 20:11 cjming@deploy1002: cjming and dani: Continuing with sync
- 20:09 fab@deploy1002: Finished deploy [airflow-dags/research@90f280e]: (no justification provided) (duration: 00m 17s)
- 20:09 fab@deploy1002: Started deploy [airflow-dags/research@90f280e]: (no justification provided)
- 20:08 cjming@deploy1002: cjming and dani: Backport for Deploy Campaigns Event Discovery survey (T345158) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 20:07 cjming@deploy1002: Started scap: Backport for Deploy Campaigns Event Discovery survey (T345158)
- 19:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bookworm
- 19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:18 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 19:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
- 19:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
- 18:59 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 18:59 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 18:52 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh1001.wikimedia.org with OS bookworm
- 18:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 18:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 18:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 18:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 18:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2029.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 18:18 topranks: Running authdns-update to add includes for newly assigned codfw subnets
- 18:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2029.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 17:57 dcausse: T345545: triggered a manual dag run to import analytics_platform_eng.image_suggestions_search_index_full/snapshot=2023-08-21
- 17:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2039.codfw.wmnet with OS bullseye
- 17:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 17:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2038.codfw.wmnet with OS bullseye
- 17:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 17:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 17:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 17:47 dcausse@deploy1002: Finished deploy [airflow-dags/search@b3d43bb]: T345545: search: generalize image_suggestions_manual (duration: 00m 26s)
- 17:47 dcausse@deploy1002: Started deploy [airflow-dags/search@b3d43bb]: T345545: search: generalize image_suggestions_manual
- 17:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bookworm
- 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2039.codfw.wmnet with reason: host reimage
- 17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2038.codfw.wmnet with reason: host reimage
- 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52263 and previous config saved to /var/cache/conftool/dbconfig/20230905-173132-ladsgroup.json
- 17:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2039.codfw.wmnet with reason: host reimage
- 17:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2038.codfw.wmnet with reason: host reimage
- 17:21 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2037.codfw.wmnet with OS bullseye
- 17:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:18 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
- 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52262 and previous config saved to /var/cache/conftool/dbconfig/20230905-171627-ladsgroup.json
- 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:10 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 17:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
- 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2036.codfw.wmnet with OS bullseye
- 17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 17:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
- 17:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
- 17:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2037.codfw.wmnet with reason: host reimage
- 17:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
- 17:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
- 17:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
- 17:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
- 17:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2035.codfw.wmnet with OS bullseye
- 17:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2034.codfw.wmnet with OS bullseye
- 17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 17:02 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
- 17:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2037.codfw.wmnet with reason: host reimage
- 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52260 and previous config saved to /var/cache/conftool/dbconfig/20230905-170122-ladsgroup.json
- 16:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mc2042.codfw.wmnet
- 16:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
- 16:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1042.eqiad.wmnet
- 16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh1002.wikimedia.org with OS bookworm
- 16:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
- 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P52259 and previous config saved to /var/cache/conftool/dbconfig/20230905-164618-ladsgroup.json
- 16:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1042.eqiad.wmnet
- 16:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
- 16:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2037.codfw.wmnet with OS bullseye
- 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2033.codfw.wmnet with OS bullseye
- 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:37 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2035.codfw.wmnet with reason: host reimage
- 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2034.codfw.wmnet with reason: host reimage
- 16:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
- 16:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
- 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2035.codfw.wmnet with reason: host reimage
- 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2034.codfw.wmnet with reason: host reimage
- 16:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2036.codfw.wmnet with OS bullseye
- 16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2030.codfw.wmnet with OS bullseye
- 16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
- 16:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
- 16:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2033.codfw.wmnet with reason: host reimage
- 16:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2033.codfw.wmnet with reason: host reimage
- 16:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2030.codfw.wmnet with reason: host reimage
- 16:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2035.codfw.wmnet with OS bullseye
- 16:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2034.codfw.wmnet with OS bullseye
- 16:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2030.codfw.wmnet with reason: host reimage
- 16:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2031.codfw.wmnet with OS bullseye
- 16:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2032.codfw.wmnet with OS bullseye
- 16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 16:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
- 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2033.codfw.wmnet with OS bullseye
- 15:49 claime: Repooled mw2448.eqiad.wmnet - T345597
- 15:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2032.codfw.wmnet with reason: host reimage
- 15:45 claime: Repooling mw2448.eqiad.wmnet
- 15:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2031.codfw.wmnet with reason: host reimage
- 15:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2032.codfw.wmnet with reason: host reimage
- 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2031.codfw.wmnet with reason: host reimage
- 15:36 kamila_: Datacenter switchover live test completed (T345588)
- 15:35 kamila@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover Live Test - T345588 (duration: 30m 45s)
- 15:34 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
- 15:28 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
- 15:28 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
- 15:27 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
- 15:27 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
- 15:25 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
- 15:25 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
- 15:25 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
- 15:25 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
- 15:25 kamila@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2023-09-05 15:25:15.979250
- 15:25 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
- 15:24 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
- 15:24 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
- 15:24 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
- 15:24 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
- 15:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2032.codfw.wmnet with OS bullseye
- 15:21 kamila@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
- 15:20 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
- 15:20 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
- 15:19 kamila@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2023-09-05 15:19:50.101327
- 15:19 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
- 15:19 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
- 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2031.codfw.wmnet with OS bullseye
- 15:19 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
- 15:19 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
- 15:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2030.codfw.wmnet with OS bullseye
- 15:13 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
- 15:13 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
- 15:13 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
- 15:13 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
- 15:13 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
- 15:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
- 15:04 kamila@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover Live Test - T345588
- 14:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3004.esams.wmnet with OS bookworm
- 14:50 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: Datacenter Switchover Live test - T345588
- 14:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2032.codfw.wmnet with OS bullseye
- 14:32 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
- 14:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testreduce1002.eqiad.wmnet with OS bookworm
- 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test vip - ayounsi@cumin1001"
- 14:28 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test vip - ayounsi@cumin1001"
- 14:26 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
- 14:26 kamila@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: Datacenter Switchover Live test - T345588
- 14:26 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
- 14:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2031.codfw.wmnet with OS bullseye
- 14:25 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in codfw: Datacenter Switchover Live test - T345588
- 14:25 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
- 14:24 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
- 14:24 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
- 14:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
- 14:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
- 14:21 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
- 14:21 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
- 14:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2030.codfw.wmnet with OS bullseye
- 14:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
- 14:16 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
- 14:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3004.esams.wmnet with reason: host reimage
- 14:15 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
- 14:15 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
- 14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2029.codfw.wmnet with OS bullseye
- 14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 14:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3004.esams.wmnet with reason: host reimage
- 14:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
- 14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
- 14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
- 14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testreduce1002.eqiad.wmnet with reason: host reimage
- 14:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testreduce1002.eqiad.wmnet with reason: host reimage
- 14:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
- 14:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
- 14:01 kamila@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Switchover Live test - T345588
- 13:57 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 13:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testreduce1002.eqiad.wmnet with OS bookworm
- 13:52 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lift wing for most wikis (T342115) (duration: 18m 33s)
- 13:46 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
- 13:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2032.codfw.wmnet with OS bullseye
- 13:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
- 13:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum3004.esams.wmnet with OS bookworm
- 13:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum3003.esams.wmnet with OS bookworm
- 13:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
- 13:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2031.codfw.wmnet with OS bullseye
- 13:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2026.codfw.wmnet with OS bullseye
- 13:36 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 13:35 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: enable lift wing for most wikis (T342115) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 13:33 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lift wing for most wikis (T342115)
- 13:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS bullseye
- 13:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52258 and previous config saved to /var/cache/conftool/dbconfig/20230905-133046-arnaudb.json
- 13:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2028.codfw.wmnet with OS bullseye
- 13:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 13:24 taavi@deploy1002: Finished scap: Backport for tlywiki: add metanamespace , timezone, sitename (T345316), tlywiki: Add logos (T345316) (duration: 10m 18s)
- 13:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2030.codfw.wmnet with OS bullseye
- 13:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2027.codfw.wmnet with OS bullseye
- 13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 13:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 13:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2029.codfw.wmnet with OS bullseye
- 13:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2025.codfw.wmnet with OS bullseye
- 13:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 13:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2026.codfw.wmnet with reason: host reimage
- 13:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 13:18 taavi@deploy1002: taavi and anzx: Continuing with sync
- 13:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test1002.eqiad.wmnet with OS bullseye
- 13:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
- 13:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2026.codfw.wmnet with reason: host reimage
- 13:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
- 13:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52257 and previous config saved to /var/cache/conftool/dbconfig/20230905-131540-arnaudb.json
- 13:15 taavi@deploy1002: taavi and anzx: Backport for tlywiki: add metanamespace , timezone, sitename (T345316), tlywiki: Add logos (T345316) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:13 taavi@deploy1002: Started scap: Backport for tlywiki: add metanamespace , timezone, sitename (T345316), tlywiki: Add logos (T345316)
- 13:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
- 13:12 taavi@deploy1002: Finished scap: Backport for Disable EchoMail and EchoInteraction instruments (T344167) (duration: 10m 14s)
- 13:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2028.codfw.wmnet with OS bullseye
- 13:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
- 13:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2025.codfw.wmnet with reason: host reimage
- 13:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
- 13:08 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
- 13:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
- 13:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1039.eqiad.wmnet
- 13:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
- 13:07 taavi@deploy1002: taavi and phuedx: Continuing with sync
- 13:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2025.codfw.wmnet with reason: host reimage
- 13:06 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
- 13:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2026.codfw.wmnet with OS bullseye
- 13:06 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
- 13:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
- 13:04 taavi@deploy1002: taavi and phuedx: Backport for Disable EchoMail and EchoInteraction instruments (T344167) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 13:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test1001.eqiad.wmnet with OS bullseye
- 13:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
- 13:02 taavi@deploy1002: Started scap: Backport for Disable EchoMail and EchoInteraction instruments (T344167)
- 13:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
- 13:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
- 13:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52254 and previous config saved to /var/cache/conftool/dbconfig/20230905-130034-arnaudb.json
- 12:55 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
- 12:55 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
- 12:55 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
- 12:54 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
- 12:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test1002.eqiad.wmnet with reason: host reimage
- 12:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52252 and previous config saved to /var/cache/conftool/dbconfig/20230905-124528-arnaudb.json
- 12:44 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS bullseye
- 12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test1002.eqiad.wmnet with reason: host reimage
- 12:43 elukey@deploy1002: Finished scap: Backport for Add new OAuth Rate Limiter tier for Wiki Education (T345394) (duration: 07m 49s)
- 12:37 elukey@deploy1002: elukey: Continuing with sync
- 12:37 elukey@deploy1002: elukey: Backport for Add new OAuth Rate Limiter tier for Wiki Education (T345394) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 12:37 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
- 12:35 elukey@deploy1002: Started scap: Backport for Add new OAuth Rate Limiter tier for Wiki Education (T345394)
- 12:20 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test1002.eqiad.wmnet with OS bullseye
- 12:18 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
- 12:18 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
- 12:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test1001.eqiad.wmnet with reason: host reimage
- 12:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
- 12:17 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
- 12:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
- 12:16 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
- 12:14 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test1001.eqiad.wmnet with reason: host reimage
- 12:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
- 12:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
- 11:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
- 11:52 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test1001.eqiad.wmnet with OS bullseye
- 11:51 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti-test1001.eqiad.wmnet with OS bullseye
- 11:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 11:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
- 11:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
- 11:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
- 11:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
- 11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
- 11:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
- 11:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
- 11:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
- 11:24 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard-next
- 11:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
- 11:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
- 11:18 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test1001.eqiad.wmnet with OS bullseye
- 11:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
- 11:09 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
- 11:09 kamila@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
- 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
- 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
- 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
- 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
- 10:41 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
- 10:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
- 10:36 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
- 10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
- 10:34 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
- 10:33 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
- 09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52247 and previous config saved to /var/cache/conftool/dbconfig/20230905-095254-arnaudb.json
- 09:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
- 09:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
- 09:49 moritzm: failover ganeti master in esams/BY27 to ganeti3007
- 09:43 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti-test1001']
- 09:43 ayounsi@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test1001']
- 09:41 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test1001.mgmt.eqiad.wmnet with reboot policy FORCED
- 09:26 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host ganeti-test1001.mgmt.eqiad.wmnet with reboot policy FORCED
- 09:25 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-test1002
- 09:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti-test1001
- 09:20 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-test1001
- 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test - ayounsi@cumin1001"
- 09:16 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test - ayounsi@cumin1001"
- 09:14 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
- 09:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
- 09:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
- 09:04 claime: powercycle mw1356.eqiad.wmnet
- 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
- 08:51 jnuche@deploy1002: sync-world aborted: testwikis wikis to 1.41.0-wmf.25 refs T343727 (duration: 20m 37s)
- 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
- 08:40 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 08:31 jnuche@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.25 refs T343727
- 08:12 kartik@deploy1002: Finished scap: Backport for Add "editautopatrolprotected" and "editpatrolprotected" protection levels on shwiki (T344306) (duration: 10m 47s)
- 08:06 kartik@deploy1002: aleksandar and kartik: Continuing with sync
- 08:03 kartik@deploy1002: aleksandar and kartik: Backport for Add "editautopatrolprotected" and "editpatrolprotected" protection levels on shwiki (T344306) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 08:01 kartik@deploy1002: Started scap: Backport for Add "editautopatrolprotected" and "editpatrolprotected" protection levels on shwiki (T344306)
- 07:56 kartik@deploy1002: Finished scap: Backport for Enable AbuseFilter blocks on shwiki (T345513) (duration: 19m 29s)
- 07:46 moritzm: depool mw2448 (unreachable)
- 07:45 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS bullseye
- 07:42 kartik@deploy1002: kartik and aleksandar: Continuing with sync
- 07:38 kartik@deploy1002: kartik and aleksandar: Backport for Enable AbuseFilter blocks on shwiki (T345513) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:36 kartik@deploy1002: Started scap: Backport for Enable AbuseFilter blocks on shwiki (T345513)
- 07:32 kartik@deploy1002: Finished scap: Backport for Enable Section and Content Translation in 7 Wikipedias (T343211) (duration: 15m 45s)
- 07:23 moritzm: failover ganeti masters in esams to ganeti3007/ganeti3008
- 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
- 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
- 07:20 kartik@deploy1002: kartik: Continuing with sync
- 07:18 kartik@deploy1002: kartik: Backport for Enable Section and Content Translation in 7 Wikipedias (T343211) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 07:16 kartik@deploy1002: Started scap: Backport for Enable Section and Content Translation in 7 Wikipedias (T343211)
- 07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
- 07:08 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1131.eqiad.wmnet with OS bullseye
- 07:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
- 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
- 07:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
- 06:59 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS bullseye
- 06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
- 06:51 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1130.eqiad.wmnet with OS bullseye
- 06:49 tstarling@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Labs only change, just avoiding undeployed changes (duration: 09m 25s)
- 06:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1131.eqiad.wmnet with reason: host reimage
- 06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
- 06:43 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1131.eqiad.wmnet with reason: host reimage
- 06:29 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1131.eqiad.wmnet with OS bullseye
- 06:26 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1130.eqiad.wmnet with reason: host reimage
- 06:24 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1130.eqiad.wmnet with reason: host reimage
- 06:10 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1130.eqiad.wmnet with OS bullseye
- 06:06 kart_: Updated cxserver to 2023-08-29-191442-production (T345170, T343450)
- 06:04 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 06:04 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 06:01 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 05:58 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 05:57 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 05:55 kart_: Updated MinT to 2023-09-04-051105-production (T336683)
- 05:46 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 05:41 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 05:36 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 05:30 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 05:25 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 05:22 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 03:59 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.25 refs T343727 (duration: 56m 29s)
- 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.25 refs T343727
2023-09-04
- 16:14 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 16:07 topranks: setting port 1/1/5 to speed 100G on cr2-codfw
- 16:06 topranks: setting port 1/1/5 to speed 100G on cr1-codfw
- 16:05 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 15:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 14s)
- 15:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 01s)
- 14:57 moritzm: installing json-c security updates
- 14:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 14:47 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 14:44 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 14:31 godog: bounce prometheus@k8s-aux
- 14:29 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 13:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 13:58 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 13:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 13:55 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 13:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
- 13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:50 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:50 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
- 13:50 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:49 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
- 13:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
- 13:48 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 13:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 13:48 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 13:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
- 13:41 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
- 13:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 13:15 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 13:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 12:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1002.eqiad.wmnet with OS bullseye
- 12:46 hnowlan: staggered restarting restbase service on A:restbase
- 12:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 149665
- 12:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 149665
- 12:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138884
- 12:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138884
- 12:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136065
- 12:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136065
- 12:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 27381
- 12:18 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netbox1002.eqiad.wmnet
- 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox1002.eqiad.wmnet
- 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox2002.codfw.wmnet
- 12:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 27381
- 12:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox2002.codfw.wmnet
- 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
- 11:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
- 11:53 hnowlan@deploy1002: Finished deploy [restbase/deploy@26bc1a5]: Add new wikis T343543 T343549 T345171 (duration: 14m 32s)
- 11:51 moritzm: installing grub2 updates from Bullseye point release
- 11:51 moritzm: installing grub2 updates from Bullseye point relese
- 11:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1002.eqiad.wmnet with reason: host reimage
- 11:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1002.eqiad.wmnet with reason: host reimage
- 11:38 hnowlan@deploy1002: Started deploy [restbase/deploy@26bc1a5]: Add new wikis T343543 T343549 T345171
- 11:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1002.eqiad.wmnet with OS bullseye
- 11:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: " - jbond@cumin1001 - T342534"
- 11:08 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: " - jbond@cumin1001 - T342534"
- 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
- 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
- 10:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
- 10:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
- 10:29 jbond: enable-puppet fleet wide post "deploy confd change gerrit:954007"
- 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
- 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
- 10:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
- 10:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
- 09:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
- 09:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
- 09:49 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
- 09:49 akosiaris: T345290. Update mathoid to 2023-05-13-192519-production
- 09:48 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
- 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
- 09:48 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
- 09:48 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
- 09:47 jbond: disable-puppet fleet wide "deploy confd change gerrit:954007"
- 09:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
- 09:45 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Add CP secret (duration: 15m 47s)
- 09:44 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 09:43 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
- 09:43 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
- 09:42 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1129.eqiad.wmnet with OS bullseye
- 09:42 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
- 09:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
- 09:39 ladsgroup@deploy1002: ladsgroup: Continuing with sync
- 09:38 ladsgroup@deploy1002: ladsgroup: Add CP secret synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
- 09:34 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
- 09:34 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
- 09:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
- 09:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
- 09:29 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1129.eqiad.wmnet with reason: host reimage
- 09:29 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
- 09:28 akosiaris: deploying mathoid to bump service mesh envoy version to 1.23.10-2-s2. No changes to the app.
- 09:27 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1129.eqiad.wmnet with reason: host reimage
- 09:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
- 09:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
- 09:14 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1129.eqiad.wmnet with OS bullseye
- 09:13 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
- 09:10 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1003.eqiad.wmnet
- 09:09 elukey: rename "ens5" to "ens13" on orespoolcounter1003's /etc/network/interfaces after a VM reboot
- 09:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1003.eqiad.wmnet
- 09:04 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1004.eqiad.wmnet
- 08:57 elukey: rename "ens5" to "ens13" on orespoolcounter1004's /etc/network/interfaces after a VM reboot
- 08:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
- 08:51 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
- 08:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2003.codfw.wmnet
- 08:46 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1004.eqiad.wmnet
- 08:46 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2003.codfw.wmnet
- 08:45 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
- 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast6002.wikimedia.org
- 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
- 08:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2003.codfw.wmnet
- 08:41 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:39 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-eqiad
- 08:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
- 08:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1003.eqiad.wmnet
- 08:37 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
- 08:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast6002.wikimedia.org
- 08:34 elukey: rename "ens5" to "ens13" on orespoolcounter2003's /etc/network/interfaces after a VM reboot
- 08:33 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-eqiad
- 08:33 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1003.eqiad.wmnet
- 08:31 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
- 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast5003.wikimedia.org
- 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:28 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubestagemaster2002.codfw.wmnet
- 08:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:25 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
- 08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet
- 08:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubestagemaster1002.eqiad.wmnet
- 08:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:19 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
- 08:18 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
- 08:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast5003.wikimedia.org
- 08:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
- 08:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
- 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast4004.wikimedia.org
- 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:15 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2002.codfw.wmnet
- 08:15 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet
- 08:14 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2003.codfw.wmnet
- 08:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
- 08:14 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
- 08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
- 08:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
- 08:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
- 08:13 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:11 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1002.eqiad.wmnet
- 08:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet
- 08:10 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
- 08:09 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
- 08:08 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4004.wikimedia.org
- 08:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1001.eqiad.wmnet
- 08:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
- 08:00 elukey: restart kubelet on ml-serve1002 to check if stale prometheus metrics are the cause of the stop_container alert
- 08:00 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2004.codfw.wmnet
- 07:59 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2004.codfw.wmnet
- 07:35 Emperor: restart tcpircbot-logmsgbot on alert1001
- 07:22 moritzm: failover ganeti masters in drmrs to ganeti6001/ganeti6002
- 06:12 XioNoX: push new pfw policies - T345288
2023-09-02
- 15:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1128.eqiad.wmnet with reason: depooled after replica lag page, two days
- 15:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1128.eqiad.wmnet with reason: depooled after replica lag page, two days
- 15:49 sukhe@cumin2002: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P52244 and previous config saved to /var/cache/conftool/dbconfig/20230902-154903-sukhe.json
- 05:45 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
- 05:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
- 05:38 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1008.eqiad.wmnet
- 05:32 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1008.eqiad.wmnet
- 00:06 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:06 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay IRBs. - cmooney@cumin1001"
- 00:05 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay IRBs. - cmooney@cumin1001"
- 00:02 cmooney@cumin1001: START - Cookbook sre.dns.netbox
2023-09-01
- 23:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:55 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay loopbacks. - cmooney@cumin1001"
- 23:54 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay loopbacks. - cmooney@cumin1001"
- 23:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 23:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
- 22:46 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-a4-codfw.mgmt.codfw.wmnet
- 22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-a5-codfw.mgmt.codfw.wmnet
- 22:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2027.codfw.wmnet with OS bullseye
- 22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-a8-codfw.mgmt.codfw.wmnet
- 22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-b4-codfw.mgmt.codfw.wmnet
- 22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 22:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
- 22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
- 22:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-b5-codfw.mgmt.codfw.wmnet
- 22:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 22:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
- 22:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh4002.wikimedia.org
- 22:22 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh4002.wikimedia.org
- 22:02 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh4002.wikimedia.org with OS bookworm
- 21:57 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw
- 21:57 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b8-codfw
- 21:57 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw
- 21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b7-codfw
- 21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw
- 21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b6-codfw
- 21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw
- 21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b5-codfw
- 21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw
- 21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b4-codfw
- 21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw
- 21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b3-codfw
- 21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw
- 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b2-codfw
- 21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw
- 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a8-codfw
- 21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw
- 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a7-codfw
- 21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw
- 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a6-codfw
- 21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw
- 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a5-codfw
- 21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw
- 21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a4-codfw
- 21:54 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw
- 21:54 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a3-codfw
- 21:54 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw
- 21:54 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a2-codfw
- 21:54 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a1-codfw
- 21:54 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a1-codfw
- 21:52 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
- 21:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
- 21:40 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b8-codfw.mgmt.codfw.wmnet
- 21:36 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b7-codfw.mgmt.codfw.wmnet
- 21:32 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b6-codfw.mgmt.codfw.wmnet
- 21:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh4002.wikimedia.org with OS bookworm
- 21:29 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b5-codfw.mgmt.codfw.wmnet
- 21:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
- 21:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
- 21:11 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
- 21:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
- 21:08 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:08 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b8-codfw - cmooney@cumin1001"
- 21:08 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b8-codfw - cmooney@cumin1001"
- 21:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 21:05 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b8-codfw.mgmt.codfw.wmnet
- 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b7-codfw - cmooney@cumin1001"
- 21:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b7-codfw - cmooney@cumin1001"
- 21:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 21:01 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b7-codfw.mgmt.codfw.wmnet
- 21:01 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:01 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b6-codfw - cmooney@cumin1001"
- 21:00 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b6-codfw - cmooney@cumin1001"
- 20:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 20:58 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b6-codfw.mgmt.codfw.wmnet
- 20:57 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:57 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b5-codfw - cmooney@cumin1001"
- 20:56 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b5-codfw - cmooney@cumin1001"
- 20:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
- 20:26 robh@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 173
- 20:25 robh@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 173
- 20:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh3003.wikimedia.org
- 20:11 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh3003.wikimedia.org
- 20:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 20:04 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b5-codfw.mgmt.codfw.wmnet
- 20:03 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b4-codfw.mgmt.codfw.wmnet
- 20:00 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
- 19:59 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
- 19:56 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 19:56 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:56 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
- 19:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
- 19:48 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 19:47 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
- 19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b4-codfw - cmooney@cumin1001"
- 19:30 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
- 19:30 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
- 19:28 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b4-codfw - cmooney@cumin1001"
- 19:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
- 19:23 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh3003.wikimedia.org with OS bookworm
- 19:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2026.codfw.wmnet with OS bullseye
- 19:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
- 19:12 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
- 19:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
- 19:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2027.codfw.wmnet with OS bullseye
- 19:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
- 19:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
- 19:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
- 18:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2029.codfw.wmnet with OS bullseye
- 18:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2028.codfw.wmnet with OS bullseye
- 18:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
- 18:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
- 18:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2027.codfw.wmnet with OS bullseye
- 18:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh3003.wikimedia.org with OS bookworm
- 18:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
- 18:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
- 18:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
- 18:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
- 18:39 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 18:39 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b4-codfw.mgmt.codfw.wmnet
- 18:35 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release
- 18:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2029.codfw.wmnet with OS bullseye
- 18:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
- 18:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2028.codfw.wmnet with OS bullseye
- 18:22 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices1006.eqiad.wmnet with OS bullseye
- 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
- 18:21 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b3-codfw.mgmt.codfw.wmnet
- 18:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
- 18:16 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh3004.wikimedia.org
- 18:16 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh3004.wikimedia.org
- 18:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2026.codfw.wmnet with OS bullseye
- 18:04 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh3004.wikimedia.org with OS bookworm
- 17:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
- 17:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
- 17:53 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
- 17:53 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
- 17:53 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
- 17:49 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:49 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b3-codfw - cmooney@cumin1001"
- 17:48 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b3-codfw - cmooney@cumin1001"
- 17:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 17:46 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b3-codfw.mgmt.codfw.wmnet
- 17:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh3004.wikimedia.org with OS bookworm
- 17:30 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b2-codfw.mgmt.codfw.wmnet
- 17:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh5001.wikimedia.org
- 17:19 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh5001.wikimedia.org
- 17:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2036']
- 17:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2036']
- 17:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2036']
- 17:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2036']
- 17:13 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:13 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new spine links. - cmooney@cumin1001"
- 17:11 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new spine links. - cmooney@cumin1001"
- 17:11 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release
- 17:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 17:06 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh5001.wikimedia.org with OS bookworm
- 16:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
- 16:59 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
- 16:58 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:58 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b2-codfw - cmooney@cumin1001"
- 16:58 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b2-codfw - cmooney@cumin1001"
- 16:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 16:55 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b2-codfw.mgmt.codfw.wmnet
- 16:54 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
- 16:53 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
- 16:53 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
- 16:50 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
- 16:50 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
- 16:50 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
- 16:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
- 16:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2036.codfw.wmnet with OS bullseye
- 16:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
- 16:22 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
- 16:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2036.codfw.wmnet with OS bullseye
- 16:21 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
- 16:19 pmiazga: T343983 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=amwiki --logwiki=metawiki Jean-Mahmood User92259453
- 16:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2037.codfw.wmnet with OS bullseye
- 15:57 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
- 15:57 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
- 15:55 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh5001.wikimedia.org with OS bookworm
- 15:43 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a8-codfw.mgmt.codfw.wmnet
- 15:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
- 15:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
- 15:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
- 15:11 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:11 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a8-codfw - cmooney@cumin1001"
- 15:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
- 15:05 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
- 14:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply security updates - bking@cumin1001 - T344587
- 14:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a8-codfw - cmooney@cumin1001"
- 14:49 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 14:49 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a8-codfw.mgmt.codfw.wmnet
- 14:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
- 14:40 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
- 14:39 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
- 14:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2036.codfw.wmnet with OS bullseye
- 14:39 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2001.codfw.wmnet on all recursors
- 14:38 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
- 14:38 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:38 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
- 14:34 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 14:33 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
- 14:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
- 14:31 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 14:30 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
- 14:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 14:29 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: sync
- 14:29 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
- 14:28 bking@cumin1001: START - Cookbook sre.dns.netbox
- 14:28 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
- 14:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
- 14:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 14:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 14:23 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: sync
- 14:23 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
- 14:23 lsobanski@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security release
- 14:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 14:21 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: sync
- 14:21 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
- 14:21 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 14:20 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:17 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
- 14:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
- 14:12 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2036.codfw.wmnet with OS bullseye
- 14:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
- 14:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
- 13:53 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
- 13:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
- 13:50 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:49 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
- 13:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
- 13:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
- 13:40 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
- 13:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
- 13:40 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
- 13:39 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
- 13:33 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
- 13:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
- 13:33 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
- 13:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
- 13:31 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 13:31 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 13:26 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply security updates - bking@cumin1001 - T344587
- 13:25 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
- 13:25 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
- 13:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
- 13:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
- 13:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 13:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 13:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 13:19 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 13:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
- 13:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
- 13:16 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:16 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 13:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 13:10 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a7-codfw.mgmt.codfw.wmnet
- 13:07 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
- 13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
- 13:00 lsobanski@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security release
- 12:58 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 12:58 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 12:55 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
- 12:54 hashar: Build /releng/operations-puppet:0.9.0 image and now updated the CI Job operations-puppet-tests-buster-docker to use tox 4.8.0 # T345152
- 12:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
- 12:51 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:51 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
- 12:50 aborrero@cumin1001: START - Cookbook sre.dns.netbox
- 12:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
- 12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
- 12:44 hashar: Updated CI Job operations-puppet-tests-buster-docker to use tox 4.8.0 # T345152
- 12:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
- 12:40 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
- 12:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:39 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a7-codfw - cmooney@cumin1001"
- 12:38 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a7-codfw - cmooney@cumin1001"
- 12:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
- 12:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
- 12:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
- 12:32 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
- 12:31 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
- 12:31 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
- 12:31 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
- 12:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
- 12:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
- 12:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
- 12:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
- 12:24 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
- 12:24 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
- 12:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
- 12:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
- 12:23 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
- 12:23 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
- 12:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
- 12:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
- 12:07 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
- 12:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
- 12:03 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
- 12:03 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
- 12:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
- 12:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
- 11:55 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
- 11:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
- 11:53 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
- 11:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
- 11:47 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
- 11:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
- 11:44 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 11:44 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a7-codfw.mgmt.codfw.wmnet
- 11:08 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
- 11:02 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a6-codfw.mgmt.codfw.wmnet
- 11:02 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 11:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 11:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
- 11:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 11:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 11:00 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:59 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
- 10:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 10:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 10:57 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:55 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
- 10:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
- 10:53 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 10:53 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 10:51 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 10:51 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 10:46 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
- 10:44 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
- 10:43 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 10:42 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
- 10:42 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 10:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
- 10:35 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
- 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
- 10:35 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 10:34 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 10:34 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:33 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
- 10:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
- 10:32 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a6-codfw - cmooney@cumin1001"
- 10:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a6-codfw - cmooney@cumin1001"
- 10:28 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
- 10:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 10:25 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a6-codfw.mgmt.codfw.wmnet
- 10:24 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
- 10:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
- 10:24 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug2002.codfw.wmnet
- 10:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
- 10:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
- 10:19 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwdebug2002.codfw.wmnet
- 10:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug2001.codfw.wmnet
- 10:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwdebug2001.codfw.wmnet
- 10:13 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
- 10:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
- 10:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
- 10:12 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 10:11 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 10:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
- 10:10 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a5-codfw.mgmt.codfw.wmnet
- 10:07 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 10:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 10:05 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 10:05 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 10:04 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:03 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
- 10:03 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:03 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
- 10:03 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:02 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
- 09:40 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 09:39 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 09:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a5-codfw - cmooney@cumin1001"
- 09:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
- 09:38 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a5-codfw - cmooney@cumin1001"
- 09:37 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 09:36 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
- 09:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 09:35 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 09:35 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a5-codfw.mgmt.codfw.wmnet
- 09:35 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
- 09:34 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
- 09:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
- 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1003.eqiad.wmnet
- 09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1003.eqiad.wmnet
- 09:14 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
- 09:14 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
- 09:08 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a4-codfw.mgmt.codfw.wmnet
- 09:04 claime: Running puppet on 'A:cp-text and P{P:trafficserver::backend}' - T341780
- 09:02 claime: Push 4% of global traffic to mw-on-k8s - T341780
- 09:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 08:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2003.codfw.wmnet
- 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2003.codfw.wmnet
- 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
- 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on testreduce1002.eqiad.wmnet with reason: WIP
- 08:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on testreduce1002.eqiad.wmnet with reason: WIP
- 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
- 08:40 claime: Raised mw-web and mw-api-ext capacity by ~30% - T341780
- 08:39 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 08:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 08:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 08:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 08:37 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:37 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a4-codfw - cmooney@cumin1001"
- 08:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 08:36 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a4-codfw - cmooney@cumin1001"
- 08:36 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 08:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 08:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 08:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
- 08:34 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a4-codfw.mgmt.codfw.wmnet
- 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
- 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
- 08:30 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a3-codfw.mgmt.codfw.wmnet
- 08:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
- 08:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
- 07:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 07:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a3-codfw - cmooney@cumin1001"
- 07:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 173
- 07:34 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 173
- 07:16 moritzm: failover Ganeti master in eqsin to ganeti5004
- 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
- 07:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
- 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
- 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
- 07:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
- 07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
- 06:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
- 06:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
- 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
- 06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
- 06:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
- 06:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
- 06:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
- 06:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
- 06:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
- 05:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
- 05:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
- 05:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
- 05:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
- 05:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
- 00:43 tstarling@deploy1002: Synchronized php-1.41.0-wmf.24/extensions/LoginNotify/includes/Hooks.php: fix production error T345373 (duration: 06m 13s)
- 00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
- 00:03 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Raise LoginNotify minimum log level to info T174200 (duration: 06m 51s)