Server Admin Log/Archive 71

2023-09-30

19:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T343198)', diff saved to https://phabricator.wikimedia.org/P52795 and previous config saved to /var/cache/conftool/dbconfig/20230930-194448-arnaudb.json
19:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
19:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
19:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T343198)', diff saved to https://phabricator.wikimedia.org/P52794 and previous config saved to /var/cache/conftool/dbconfig/20230930-194427-arnaudb.json
19:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P52793 and previous config saved to /var/cache/conftool/dbconfig/20230930-192920-arnaudb.json
19:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P52792 and previous config saved to /var/cache/conftool/dbconfig/20230930-191414-arnaudb.json
18:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T343198)', diff saved to https://phabricator.wikimedia.org/P52791 and previous config saved to /var/cache/conftool/dbconfig/20230930-185908-arnaudb.json
14:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T343198)', diff saved to https://phabricator.wikimedia.org/P52790 and previous config saved to /var/cache/conftool/dbconfig/20230930-142054-arnaudb.json
14:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
14:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
14:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T343198)', diff saved to https://phabricator.wikimedia.org/P52789 and previous config saved to /var/cache/conftool/dbconfig/20230930-142017-arnaudb.json
14:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P52788 and previous config saved to /var/cache/conftool/dbconfig/20230930-140510-arnaudb.json
13:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P52787 and previous config saved to /var/cache/conftool/dbconfig/20230930-135004-arnaudb.json
13:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T343198)', diff saved to https://phabricator.wikimedia.org/P52786 and previous config saved to /var/cache/conftool/dbconfig/20230930-133458-arnaudb.json
09:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
09:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: setup in progress
08:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T343198)', diff saved to https://phabricator.wikimedia.org/P52785 and previous config saved to /var/cache/conftool/dbconfig/20230930-084720-arnaudb.json
08:47 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
08:47 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
08:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T343198)', diff saved to https://phabricator.wikimedia.org/P52784 and previous config saved to /var/cache/conftool/dbconfig/20230930-084658-arnaudb.json
08:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P52783 and previous config saved to /var/cache/conftool/dbconfig/20230930-083152-arnaudb.json
08:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P52782 and previous config saved to /var/cache/conftool/dbconfig/20230930-081645-arnaudb.json
08:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T343198)', diff saved to https://phabricator.wikimedia.org/P52781 and previous config saved to /var/cache/conftool/dbconfig/20230930-080139-arnaudb.json
02:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2140 (T343198)', diff saved to https://phabricator.wikimedia.org/P52780 and previous config saved to /var/cache/conftool/dbconfig/20230930-025624-arnaudb.json
02:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
02:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance

2023-09-29

23:41 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_esams
23:40 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-upload_esams
22:44 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
22:44 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
22:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52779 and previous config saved to /var/cache/conftool/dbconfig/20230929-224409-arnaudb.json
22:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P52778 and previous config saved to /var/cache/conftool/dbconfig/20230929-222902-arnaudb.json
22:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P52777 and previous config saved to /var/cache/conftool/dbconfig/20230929-221356-arnaudb.json
21:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52776 and previous config saved to /var/cache/conftool/dbconfig/20230929-215849-arnaudb.json
21:00 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_esams
20:59 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_esams
20:35 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_drmrs
20:34 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-upload_drmrs
19:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
19:46 bking@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
19:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1003.eqiad.wmne']
19:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-master1004.eqiad.wmne']
19:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003.eqiad.wmne']
19:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004.eqiad.wmne']
18:55 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1023.eqiad.wmnet
18:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1023.eqiad.wmnet
18:43 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1023.eqiad.wmnet
18:42 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1023.eqiad.wmnet
18:19 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1022.eqiad.wmnet
18:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1022.eqiad.wmnet
17:54 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_drmrs
17:54 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_drmrs
17:53 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1022.eqiad.wmnet
17:53 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1022.eqiad.wmnet
17:08 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-upload_eqiad
17:06 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_eqiad
16:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52774 and previous config saved to /var/cache/conftool/dbconfig/20230929-165347-arnaudb.json
16:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
16:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
16:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52773 and previous config saved to /var/cache/conftool/dbconfig/20230929-165326-arnaudb.json
16:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P52772 and previous config saved to /var/cache/conftool/dbconfig/20230929-163819-arnaudb.json
16:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
16:27 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
16:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P52771 and previous config saved to /var/cache/conftool/dbconfig/20230929-162313-arnaudb.json
16:22 inflatador: bking@wdqs1016 depooling to compress JNL file T347605
16:16 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
16:15 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
16:15 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
16:14 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
16:13 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
16:13 jiji@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
16:11 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:11 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw codfw - aborrero@cumin1001"
16:08 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1031.eqiad.wmnet
16:08 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1031.eqiad.wmnet
16:08 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw codfw - aborrero@cumin1001"
16:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52770 and previous config saved to /var/cache/conftool/dbconfig/20230929-160807-arnaudb.json
16:06 aborrero@cumin1001: START - Cookbook sre.dns.netbox
15:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on restbase1031.eqiad.wmnet with reason: Upgrading BIOS
15:54 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on restbase1031.eqiad.wmnet with reason: Upgrading BIOS
15:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
15:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
15:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
15:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
15:48 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1028.eqiad.wmnet
15:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1028.eqiad.wmnet
15:35 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1028.eqiad.wmnet
15:35 bking@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:35 bking@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
15:34 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1028.eqiad.wmnet
15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1003.eqiad.wmne']
15:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-master1004.eqiad.wmne']
15:33 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1021.eqiad.wmnet
15:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1021.eqiad.wmnet
15:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1003.eqiad.wmne']
15:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-master1004.eqiad.wmne']
15:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
15:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-master1003.mgmt.eqiad.wmnet with reboot policy FORCED
15:23 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:23 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:22 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
15:21 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1021.eqiad.wmnet
15:20 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1021.eqiad.wmnet
15:19 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1020.eqiad.wmnet
15:19 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase1020.eqiad.wmnet
15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
15:14 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
15:08 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase1020.eqiad.wmnet
15:07 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1020.eqiad.wmnet
14:55 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
14:55 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
14:54 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
14:54 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
14:54 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
14:54 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
14:54 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
14:53 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
14:53 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2027.codfw.wmnet
14:53 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2027.codfw.wmnet
14:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1004.mgmt.eqiad.wmnet with reboot policy FORCED
14:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-master1003.mgmt.eqiad.wmnet with reboot policy FORCED
14:40 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
14:40 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
14:38 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
14:38 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
14:27 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_eqiad
14:27 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_eqiad
14:23 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-upload_eqsin
14:21 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_eqsin
14:20 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1101.mgmt.eqiad.wmnet with reboot policy FORCED
14:12 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cp1100.mgmt.eqiad.wmnet with reboot policy FORCED
14:07 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:07 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
12:54 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1010.eqiad.wmnet with OS bookworm
12:39 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab version upgrade
12:37 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
12:34 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
12:18 jynus@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bookworm
11:59 topranks: adjusting evpn_db BGP export filter lsw1-f3-eqiad
11:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1085.eqiad.wmnet
11:40 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1085.eqiad.wmnet
11:40 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_eqsin
11:40 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_eqsin
11:35 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-worker1086.eqiad.wmnet
11:34 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1086.eqiad.wmnet
11:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T343198)', diff saved to https://phabricator.wikimedia.org/P52767 and previous config saved to /var/cache/conftool/dbconfig/20230929-111353-arnaudb.json
11:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
11:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
11:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52766 and previous config saved to /var/cache/conftool/dbconfig/20230929-111331-arnaudb.json
11:09 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab version upgrade
10:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on an-worker1085.eqiad.wmnet with reason: Cold booting to see if it helps with RAID BBU
10:58 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on an-worker1085.eqiad.wmnet with reason: Cold booting to see if it helps with RAID BBU
10:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P52765 and previous config saved to /var/cache/conftool/dbconfig/20230929-105825-arnaudb.json
10:58 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab version upgrade
10:52 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab version upgrade
10:49 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab version upgrade
10:43 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab version upgrade
10:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P52764 and previous config saved to /var/cache/conftool/dbconfig/20230929-104318-arnaudb.json
10:35 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
10:35 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
10:28 arnaudb@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host backup1010.eqiad.wmnet with OS bookworm
10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52763 and previous config saved to /var/cache/conftool/dbconfig/20230929-102812-arnaudb.json
10:19 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
10:19 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
10:18 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
10:18 jiji@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
10:09 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
10:09 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: sync
10:09 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: sync
10:09 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
09:08 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: sync
09:08 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: sync
05:32 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52760 and previous config saved to /var/cache/conftool/dbconfig/20230929-053158-arnaudb.json
05:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
05:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
05:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T343198)', diff saved to https://phabricator.wikimedia.org/P52759 and previous config saved to /var/cache/conftool/dbconfig/20230929-053136-arnaudb.json
05:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P52758 and previous config saved to /var/cache/conftool/dbconfig/20230929-051630-arnaudb.json
05:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P52757 and previous config saved to /var/cache/conftool/dbconfig/20230929-050123-arnaudb.json
04:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T343198)', diff saved to https://phabricator.wikimedia.org/P52756 and previous config saved to /var/cache/conftool/dbconfig/20230929-044617-arnaudb.json
02:57 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices2005-dev.codfw.wmnet with OS bookworm
01:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52755 and previous config saved to /var/cache/conftool/dbconfig/20230929-014825-arnaudb.json
01:40 ejegg: payments-wiki upgraded from c4c9b938 to d6ad0376
01:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P52754 and previous config saved to /var/cache/conftool/dbconfig/20230929-013319-arnaudb.json
01:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P52753 and previous config saved to /var/cache/conftool/dbconfig/20230929-011813-arnaudb.json
01:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52752 and previous config saved to /var/cache/conftool/dbconfig/20230929-010306-arnaudb.json
00:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1227.mgmt.eqiad.wmnet with reboot policy FORCED
00:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db1227.mgmt.eqiad.wmnet with reboot policy FORCED

2023-09-28

23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52751 and previous config saved to /var/cache/conftool/dbconfig/20230928-235053-arnaudb.json
23:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
23:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52750 and previous config saved to /var/cache/conftool/dbconfig/20230928-235032-arnaudb.json
23:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T343198)', diff saved to https://phabricator.wikimedia.org/P52749 and previous config saved to /var/cache/conftool/dbconfig/20230928-234246-arnaudb.json
23:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
23:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
23:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T343198)', diff saved to https://phabricator.wikimedia.org/P52748 and previous config saved to /var/cache/conftool/dbconfig/20230928-234224-arnaudb.json
23:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P52747 and previous config saved to /var/cache/conftool/dbconfig/20230928-233525-arnaudb.json
23:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P52746 and previous config saved to /var/cache/conftool/dbconfig/20230928-232718-arnaudb.json
23:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P52745 and previous config saved to /var/cache/conftool/dbconfig/20230928-232019-arnaudb.json
23:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P52744 and previous config saved to /var/cache/conftool/dbconfig/20230928-231211-arnaudb.json
23:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52743 and previous config saved to /var/cache/conftool/dbconfig/20230928-230512-arnaudb.json
22:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T343198)', diff saved to https://phabricator.wikimedia.org/P52742 and previous config saved to /var/cache/conftool/dbconfig/20230928-225705-arnaudb.json
22:40 wfan: payments-wiki change from c4c9b938 to 20828b07
22:03 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2005-dev.codfw.wmnet with reason: host reimage
22:02 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
22:00 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2005-dev.codfw.wmnet with reason: host reimage
21:58 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1033.eqiad.wmnet
21:58 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1033.eqiad.wmnet
21:58 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1030.eqiad.wmnet
21:57 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1030.eqiad.wmnet
21:57 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1027.eqiad.wmnet
21:57 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1027.eqiad.wmnet
21:57 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1026.eqiad.wmnet
21:57 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1026.eqiad.wmnet
21:56 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1025.eqiad.wmnet
21:56 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet
21:56 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1032.eqiad.wmnet
21:56 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1032.eqiad.wmnet
21:55 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1029.eqiad.wmnet
21:55 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
21:55 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1029.eqiad.wmnet
21:55 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1029.eqiad.wmnet
21:54 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1024.eqiad.wmnet
21:54 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1024.eqiad.wmnet
21:54 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1023.eqiad.wmnet
21:54 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1023.eqiad.wmnet
21:54 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1022.eqiad.wmnet
21:54 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1022.eqiad.wmnet
21:53 wfan: payments-wiki change from 505a616d to 20828b07
21:52 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
21:42 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
21:42 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
21:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices2005-dev.codfw.wmnet with OS bookworm
21:30 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1028.eqiad.wmnet
21:30 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1028.eqiad.wmnet
21:28 bking@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
21:25 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
21:25 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
21:14 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
21:13 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
21:07 thcipriani@deploy2002: Finished scap: Backport for Drop the desktop improvements dblist group (T347444) (duration: 11m 22s)
21:00 thcipriani@deploy2002: jdlrobson and thcipriani: Continuing with sync
20:59 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:57 thcipriani@deploy2002: jdlrobson and thcipriani: Backport for Drop the desktop improvements dblist group (T347444) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:55 thcipriani@deploy2002: Started scap: Backport for Drop the desktop improvements dblist group (T347444)
20:55 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:55 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:52 thcipriani@deploy2002: Finished scap: Backport for update sawikiquote logos (T341260), Wikimedia special project logo updates (duration: 16m 32s)
20:45 thcipriani@deploy2002: anzx and jdlrobson and thcipriani: Continuing with sync
20:36 thcipriani@deploy2002: anzx and jdlrobson and thcipriani: Backport for update sawikiquote logos (T341260), Wikimedia special project logo updates synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:35 thcipriani@deploy2002: Started scap: Backport for update sawikiquote logos (T341260), Wikimedia special project logo updates
20:28 thcipriani@deploy2002: Finished scap: Backport for Add 'confirmed' to Wikifunctions sysop add and remove (T344261), add 'autopatrol' to Wikifunctions' functioneer group (T344085), add autopatrolled group with autopatrol right for Wikifunctions (T343946) (duration: 10m 06s)
20:21 thcipriani@deploy2002: mdaniels5757 and thcipriani and terasail: Continuing with sync
20:19 thcipriani@deploy2002: mdaniels5757 and thcipriani and terasail: Backport for Add 'confirmed' to Wikifunctions sysop add and remove (T344261), add 'autopatrol' to Wikifunctions' functioneer group (T344085), add autopatrolled group with autopatrol right for Wikifunctions (T343946) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:18 thcipriani@deploy2002: Started scap: Backport for Add 'confirmed' to Wikifunctions sysop add and remove (T344261), add 'autopatrol' to Wikifunctions' functioneer group (T344085), add autopatrolled group with autopatrol right for Wikifunctions (T343946)
20:11 taavi: create new oathauth tables on labtestwikitech and run `taavi@cloudweb2002-dev ~ $ mwscript extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php labtestwiki`, fixes T347627
20:04 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T347624, testing new cookbook changes) xfer categories => wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet, repooling both afterwards
20:03 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:03 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:02 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
19:56 brennen@deploy2002: Finished scap: Backport for Handle SpecialPage::getDescription() returning a Message (T347620) (duration: 09m 53s)
19:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer (T347624, testing new cookbook changes) xfer categories => wdqs2024.codfw.wmnet -> wdqs2025.codfw.wmnet, repooling both afterwards
19:50 brennen@deploy2002: matmarex and brennen: Continuing with sync
19:48 brennen@deploy2002: matmarex and brennen: Backport for Handle SpecialPage::getDescription() returning a Message (T347620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:46 brennen@deploy2002: Started scap: Backport for Handle SpecialPage::getDescription() returning a Message (T347620)
19:27 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
19:24 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
19:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
19:23 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
19:14 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
19:14 bking@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
19:13 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007.wikimedia.org']
19:13 robh@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1007']
19:12 robh@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007']
19:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52737 and previous config saved to /var/cache/conftool/dbconfig/20230928-190216-arnaudb.json
19:06 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
19:06 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
19:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52736 and previous config saved to /var/cache/conftool/dbconfig/20230928-190154-arnaudb.json
19:00 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 7 hosts matching query A:cp-upload_codfw and not P{cp2028*}
19:00 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 7 hosts matching query A:cp-text_codfw and not P{cp2027*}
18:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P52735 and previous config saved to /var/cache/conftool/dbconfig/20230928-184648-arnaudb.json
18:33 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.28 refs T345889
18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P52734 and previous config saved to /var/cache/conftool/dbconfig/20230928-183141-arnaudb.json
18:24 topranks: renaming cloud-hosts1-codfw vlan to cloud-hosts1-b1-codfw on cloudsw1-b1-codfw
18:21 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:21 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
18:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52733 and previous config saved to /var/cache/conftool/dbconfig/20230928-181635-arnaudb.json
18:09 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
17:51 brett: Imported acme-chief from Gerrit into Gitlab
17:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T343198)', diff saved to https://phabricator.wikimedia.org/P52732 and previous config saved to /var/cache/conftool/dbconfig/20230928-174251-arnaudb.json
17:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
17:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
17:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T343198)', diff saved to https://phabricator.wikimedia.org/P52731 and previous config saved to /var/cache/conftool/dbconfig/20230928-174230-arnaudb.json
17:39 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:39 joal@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
17:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P52730 and previous config saved to /var/cache/conftool/dbconfig/20230928-172719-arnaudb.json
17:14 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
17:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P52729 and previous config saved to /var/cache/conftool/dbconfig/20230928-171212-arnaudb.json
16:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T343198)', diff saved to https://phabricator.wikimedia.org/P52728 and previous config saved to /var/cache/conftool/dbconfig/20230928-165706-arnaudb.json
16:42 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
16:42 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
16:42 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
16:42 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 7 hosts matching query A:cp-text_codfw and not P{cp2027*}
16:42 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 7 hosts matching query A:cp-upload_codfw and not P{cp2028*}
16:41 fabfur@cumin1001: END (ERROR) - Cookbook sre.cdn.roll-restart-varnish (exit_code=97) rolling restart of Varnish on 7 hosts matching query A:cp-upload_codfw and not P{cp2028*}
16:41 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 7 hosts matching query A:cp-upload_codfw and not P{cp2028*}
16:41 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
16:41 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
16:39 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
16:39 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
16:37 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
16:35 fabfur@cumin1001: END (ERROR) - Cookbook sre.cdn.roll-restart-varnish (exit_code=97) rolling restart of Varnish on 8 hosts matching query A:cp-upload_codfw
16:35 fabfur@cumin1001: END (ERROR) - Cookbook sre.cdn.roll-restart-varnish (exit_code=97) rolling restart of Varnish on 8 hosts matching query A:cp-text_codfw
16:26 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
16:26 joal@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
16:23 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-upload_codfw
16:23 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_codfw
16:14 hnowlan: enabling puppet on A:cp, routing mediarequests API via rest-gateway
16:03 hnowlan: disabled puppet on A:cp
15:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
15:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
15:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
15:49 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
15:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
15:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
15:48 brennen@deploy2002: Sync cancelled.
15:48 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 8 hosts matching query A:cp-text_ulsfo
15:47 brennen@deploy2002: brennen: Backport for Revert "NostalgiaTemplate.php: Fix array illegal offset error" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:46 brennen@deploy2002: Started scap: Backport for Revert "NostalgiaTemplate.php: Fix array illegal offset error"
15:39 brennen@deploy2002: Sync cancelled.
15:38 brennen@deploy2002: krinkle and brennen: Backport for NostalgiaTemplate.php: Fix array illegal offset error synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:36 brennen@deploy2002: Started scap: Backport for NostalgiaTemplate.php: Fix array illegal offset error
15:27 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-varnish (exit_code=0) rolling restart of Varnish on 7 hosts matching query A:cp-upload_ulsfo and not P{cp4052*}
15:06 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:05 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
15:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
14:49 inflatador: bking@wdqs1016 shutting down services to compress a 1.2 TB jnl file
14:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P52725 and previous config saved to /var/cache/conftool/dbconfig/20230928-144338-root.json
14:35 moritzm: installing ghostscript security updates
14:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
14:32 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1016.eqiad.wmnet with reason: jnl compression
14:13 klausman: restarting pybal on lvs1019 and lvs2013 (LVS low-traffic actives) for T347278 (ORES turndown)
14:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52723 and previous config saved to /var/cache/conftool/dbconfig/20230928-141140-arnaudb.json
14:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52722 and previous config saved to /var/cache/conftool/dbconfig/20230928-141118-arnaudb.json
14:08 cdanis: repooling cp5030 after haproxy upgrade & config deploy T317799
14:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1228.eqiad.wmnet with OS bullseye
14:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1228.eqiad.wmnet with OS bullseye
14:02 cdanis: depooling cp5030 for haproxy upgrade & testing T317799
14:01 moritzm: installing gsl security updates
14:00 klausman: restarted pybal on lvs1020 and lvs2014 (LVS low-traffic backups) for T347278 (ORES turndown)
13:57 taavi@deploy2002: Finished scap: Backport for Set WRITE_BOTH for CA wikis on OATHAuth multiple devices (T242031) (duration: 11m 02s)
13:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P52720 and previous config saved to /var/cache/conftool/dbconfig/20230928-135612-arnaudb.json
13:52 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:52 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
13:52 moritzm: installing flac security updates
13:50 taavi@deploy2002: taavi: Continuing with sync
13:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
13:47 taavi@deploy2002: taavi: Backport for Set WRITE_BOTH for CA wikis on OATHAuth multiple devices (T242031) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:47 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:47 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
13:45 taavi@deploy2002: Started scap: Backport for Set WRITE_BOTH for CA wikis on OATHAuth multiple devices (T242031)
13:43 urbanecm@deploy2002: Finished scap: Backport for Enable WikiLove on arwikisource (T346391) (duration: 11m 10s)
13:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P52719 and previous config saved to /var/cache/conftool/dbconfig/20230928-134105-arnaudb.json
13:37 urbanecm@deploy2002: zoranzoki21 and urbanecm: Continuing with sync
13:33 urbanecm@deploy2002: zoranzoki21 and urbanecm: Backport for Enable WikiLove on arwikisource (T346391) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:31 urbanecm@deploy2002: Started scap: Backport for Enable WikiLove on arwikisource (T346391)
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot-master (exit_code=0) rolling reboot on A:maps-master-eqiad
13:31 urbanecm@deploy2002: Finished scap: Backport for wikifunctionswiki: Disable NearbyPages (T345459) (duration: 11m 07s)
13:28 urbanecm: mwscript extensions/WikimediaMaintenance/createExtensionTables.php --wiki=arwikisource wikilove # T346391
13:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52718 and previous config saved to /var/cache/conftool/dbconfig/20230928-132559-arnaudb.json
13:25 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:25 urbanecm@deploy2002: ammarpad and urbanecm: Continuing with sync
13:25 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
13:24 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot-master rolling reboot on A:maps-master-eqiad
13:21 urbanecm@deploy2002: ammarpad and urbanecm: Backport for wikifunctionswiki: Disable NearbyPages (T345459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:20 urbanecm@deploy2002: Started scap: Backport for wikifunctionswiki: Disable NearbyPages (T345459)
13:19 urbanecm@deploy2002: Finished scap: Backport for Enable Campaigns email on test wiki (T347065) (duration: 12m 31s)
13:13 urbanecm@deploy2002: urbanecm and mhorsey: Continuing with sync
13:08 urbanecm@deploy2002: urbanecm and mhorsey: Backport for Enable Campaigns email on test wiki (T347065) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:07 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 7 hosts matching query A:cp-upload_ulsfo and not P{cp4052*}
13:07 fabfur@cumin1001: START - Cookbook sre.cdn.roll-restart-varnish rolling restart of Varnish on 8 hosts matching query A:cp-text_ulsfo
13:07 urbanecm@deploy2002: Started scap: Backport for Enable Campaigns email on test wiki (T347065)
13:04 elukey@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
13:03 elukey@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
13:03 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:47 elukey: restart thanos-query on titan1002
12:44 elukey: restart thanos-query on titan1001
12:41 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot-master (exit_code=0) rolling reboot on A:maps-master-codfw
12:31 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot-master rolling reboot on A:maps-master-codfw
11:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T343198)', diff saved to https://phabricator.wikimedia.org/P52717 and previous config saved to /var/cache/conftool/dbconfig/20230928-115619-arnaudb.json
11:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
11:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
11:30 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
11:26 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
11:09 fabfur: cp4037 back in pool (T347192)
11:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
11:04 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
10:56 jmm@cumin2002: END (FAIL) - Cookbook sre.maps.roll-restart-reboot-master (exit_code=1) rolling reboot on A:maps-master-codfw
10:54 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot-master rolling reboot on A:maps-master-codfw
10:51 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
10:40 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
10:40 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
10:40 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
10:40 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
10:40 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
10:40 jiji@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
10:27 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
10:08 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1001.eqiad.wmnet
09:58 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
09:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
09:54 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
09:52 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
09:52 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
09:51 _joe_: running puppet on cp-text to move mw on k8s to 10%
09:48 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
09:45 fabfur: depool cp4037 to restart varnish and apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/960112 (T347192)
09:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52715 and previous config saved to /var/cache/conftool/dbconfig/20230928-092109-arnaudb.json
09:21 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
09:20 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
09:20 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
09:20 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T343198)', diff saved to https://phabricator.wikimedia.org/P52714 and previous config saved to /var/cache/conftool/dbconfig/20230928-092032-arnaudb.json
09:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup1010.eqiad.wmnet
09:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
09:11 arnaudb@cumin1001: START - Cookbook sre.hosts.reboot-single for host backup1010.eqiad.wmnet
09:10 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
09:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
09:06 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
09:05 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2003-dev.codfw.wmnet
09:05 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P52713 and previous config saved to /var/cache/conftool/dbconfig/20230928-090526-arnaudb.json
09:04 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:04 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
09:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet with reason: rebooting backup1010
09:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet with reason: rebooting backup1010
09:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
08:59 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
08:59 jayme@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
08:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P52712 and previous config saved to /var/cache/conftool/dbconfig/20230928-085019-arnaudb.json
08:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T343198)', diff saved to https://phabricator.wikimedia.org/P52711 and previous config saved to /var/cache/conftool/dbconfig/20230928-083513-arnaudb.json
08:14 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
08:14 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
08:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
07:55 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1006.wikimedia.org
07:55 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:55 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1006.wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
07:53 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1006.wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
07:51 taavi@cumin1001: START - Cookbook sre.dns.netbox
07:48 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
07:47 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
07:46 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
07:44 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1006.wikimedia.org
07:28 taavi: test
07:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
07:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
07:25 _joe_: restarting trafficserver on cp1081 T347493
04:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T343198)', diff saved to https://phabricator.wikimedia.org/P52710 and previous config saved to /var/cache/conftool/dbconfig/20230928-044238-arnaudb.json
04:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
04:42 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
04:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T343198)', diff saved to https://phabricator.wikimedia.org/P52709 and previous config saved to /var/cache/conftool/dbconfig/20230928-044216-arnaudb.json
04:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P52708 and previous config saved to /var/cache/conftool/dbconfig/20230928-042710-arnaudb.json
04:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P52707 and previous config saved to /var/cache/conftool/dbconfig/20230928-041204-arnaudb.json
03:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T343198)', diff saved to https://phabricator.wikimedia.org/P52706 and previous config saved to /var/cache/conftool/dbconfig/20230928-035657-arnaudb.json
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1245.eqiad.wmnet with OS bullseye
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1249.eqiad.wmnet with OS bullseye
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1247.eqiad.wmnet with OS bullseye
02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1248.eqiad.wmnet with OS bullseye
02:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bullseye
02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1244.eqiad.wmnet with OS bullseye
02:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1242.eqiad.wmnet with OS bullseye
02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1243.eqiad.wmnet with OS bullseye
02:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1245.eqiad.wmnet with reason: host reimage
02:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1249.eqiad.wmnet with reason: host reimage
02:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1247.eqiad.wmnet with reason: host reimage
02:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
02:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1249.eqiad.wmnet with reason: host reimage
02:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1248.eqiad.wmnet with reason: host reimage
02:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1248.eqiad.wmnet with reason: host reimage
02:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1247.eqiad.wmnet with reason: host reimage
02:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1244.eqiad.wmnet with reason: host reimage
02:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
02:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1245.eqiad.wmnet with reason: host reimage
02:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1242.eqiad.wmnet with reason: host reimage
02:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1243.eqiad.wmnet with reason: host reimage
02:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1244.eqiad.wmnet with reason: host reimage
02:13 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1243.eqiad.wmnet with reason: host reimage
02:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1242.eqiad.wmnet with reason: host reimage
02:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1249.eqiad.wmnet with OS bullseye
02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1248.eqiad.wmnet with OS bullseye
02:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1247.eqiad.wmnet with OS bullseye
02:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bullseye
02:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1245.eqiad.wmnet with OS bullseye
02:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1244.eqiad.wmnet with OS bullseye
02:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1243.eqiad.wmnet with OS bullseye
01:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1242.eqiad.wmnet with OS bullseye
00:50 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol1008-dev']
00:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1008-dev']
00:50 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol1008-dev']
00:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1008-dev']
00:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol1008-dev']
00:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1008-dev']
00:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1008-dev']
00:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1008-dev']
00:38 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol1008-dev']
00:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol1008-dev']
00:37 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
00:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudcontrol1008-dev.mgmt.eqiad.wmnet with reboot policy FORCED
00:16 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1008-dev
00:05 eileen: civicrm upgraded from 41a4c2cf to 7406cdf3

2023-09-27

23:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bullseye
23:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:54 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
23:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
23:36 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
23:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
23:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T343198)', diff saved to https://phabricator.wikimedia.org/P52705 and previous config saved to /var/cache/conftool/dbconfig/20230927-230117-arnaudb.json
23:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
23:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
23:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T343198)', diff saved to https://phabricator.wikimedia.org/P52704 and previous config saved to /var/cache/conftool/dbconfig/20230927-230055-arnaudb.json
22:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P52703 and previous config saved to /var/cache/conftool/dbconfig/20230927-224548-arnaudb.json
22:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P52702 and previous config saved to /var/cache/conftool/dbconfig/20230927-223042-arnaudb.json
22:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
22:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T343198)', diff saved to https://phabricator.wikimedia.org/P52701 and previous config saved to /var/cache/conftool/dbconfig/20230927-222505-arnaudb.json
22:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir5002.eqsin.wmnet with OS bookworm
22:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T343198)', diff saved to https://phabricator.wikimedia.org/P52700 and previous config saved to /var/cache/conftool/dbconfig/20230927-221536-arnaudb.json
22:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P52699 and previous config saved to /var/cache/conftool/dbconfig/20230927-220959-arnaudb.json
22:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
22:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
21:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P52698 and previous config saved to /var/cache/conftool/dbconfig/20230927-215452-arnaudb.json
21:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5002.eqsin.wmnet with reason: host reimage
21:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5002.eqsin.wmnet with reason: host reimage
21:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T343198)', diff saved to https://phabricator.wikimedia.org/P52697 and previous config saved to /var/cache/conftool/dbconfig/20230927-213946-arnaudb.json
21:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
21:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir5002.eqsin.wmnet with OS bookworm
20:59 cjming: end of UTC late backport window
20:57 cjming@deploy2002: Finished scap: Backport for New projects default to Vector 2022 (T347444), Populate the legacy-vector dblist (T347444) (duration: 11m 05s)
20:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir4001.ulsfo.wmnet with OS bookworm
20:50 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
20:47 cjming@deploy2002: jdlrobson and cjming: Backport for New projects default to Vector 2022 (T347444), Populate the legacy-vector dblist (T347444) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:46 cjming@deploy2002: Started scap: Backport for New projects default to Vector 2022 (T347444), Populate the legacy-vector dblist (T347444)
20:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be1003.eqiad.wmnet with OS bullseye
20:44 cjming@deploy2002: Sync cancelled.
20:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4001.ulsfo.wmnet with reason: host reimage
20:35 cjming@deploy2002: cjming and jdlrobson: Backport for New projects default to Vector 2022 (T347444) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:34 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:34 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
20:33 cjming@deploy2002: Started scap: Backport for New projects default to Vector 2022 (T347444)
20:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4001.ulsfo.wmnet with reason: host reimage
20:31 cjming@deploy2002: Finished scap: Backport for Special wiki wordmarks and taglines (T341250), Add wordmark for li wikinews (T341258) (duration: 09m 52s)
20:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on restbase2027.codfw.wmnet with reason: Repairing/rebuilding Cassandra instances
20:27 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on restbase2027.codfw.wmnet with reason: Repairing/rebuilding Cassandra instances
20:25 cjming@deploy2002: jdlrobson and cjming: Continuing with sync
20:23 brett: update haproxy 2.6 and 2.8 into bookworm archives with reprepro - T342154
20:22 cjming@deploy2002: jdlrobson and cjming: Backport for Special wiki wordmarks and taglines (T341250), Add wordmark for li wikinews (T341258) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:21 cjming@deploy2002: Started scap: Backport for Special wiki wordmarks and taglines (T341250), Add wordmark for li wikinews (T341258)
20:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir4001.ulsfo.wmnet with OS bookworm
20:14 cjming@deploy2002: Finished scap: Backport for commonswiki: Add $wgExternalLinksDomainGaps for another domain (T341000) (duration: 10m 23s)
20:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir4002.ulsfo.wmnet with OS bookworm
20:08 cjming@deploy2002: lucaswerkmeister and cjming: Continuing with sync
20:05 cjming@deploy2002: lucaswerkmeister and cjming: Backport for commonswiki: Add $wgExternalLinksDomainGaps for another domain (T341000) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:03 cjming@deploy2002: Started scap: Backport for commonswiki: Add $wgExternalLinksDomainGaps for another domain (T341000)
19:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
19:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2027.codfw.wmnet with OS bullseye
19:52 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@c6454a9]: update rdf tools jar to .131 (duration: 00m 28s)
19:52 ebernhardson@deploy2002: Started deploy [airflow-dags/search@c6454a9]: update rdf tools jar to .131
19:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4002.ulsfo.wmnet with reason: host reimage
19:47 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4002.ulsfo.wmnet with reason: host reimage
19:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['moss-be1003']
19:39 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:39 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
19:38 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:38 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
19:35 inflatador: bking@deploy2002 deleting flink-operator leader pod to force failover T347521
19:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be1003']
19:26 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir4002.ulsfo.wmnet with OS bookworm
19:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2027.codfw.wmnet with reason: host reimage
19:22 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2027.codfw.wmnet with reason: host reimage
19:18 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lists1004.eqiad.wmnet with OS bullseye
19:18 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
19:14 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
19:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir6001.drmrs.wmnet with OS bookworm
19:08 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:07 bking@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
19:07 bking@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:06 bking@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
19:06 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1020.eqiad.wmnet
19:05 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1020.eqiad.wmnet
19:03 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2027.codfw.wmnet with OS bullseye
19:01 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase2027.codfw.wmnet
19:01 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2027.codfw.wmnet
18:59 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists1004.eqiad.wmnet with reason: host reimage
18:56 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be1003.eqiad.wmnet with OS bullseye
18:55 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lists1004.eqiad.wmnet with reason: host reimage
18:50 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2027.codfw.wmnet
18:50 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2027.codfw.wmnet
18:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2026.codfw.wmnet with OS bullseye
18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir6001.drmrs.wmnet with reason: host reimage
18:45 sukhe: re-enable puppet on O:apt_repo
18:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir6001.drmrs.wmnet with reason: host reimage
18:42 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
18:41 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
18:41 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
18:39 sukhe: disable puppet on O:apt_repo
18:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2026.codfw.wmnet with reason: host reimage
18:24 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2026.codfw.wmnet with reason: host reimage
18:24 dduvall@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.28 refs T345889 (duration: 06m 46s)
18:20 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir6001.drmrs.wmnet with OS bookworm
18:19 brett: re-enabling puppet on apt1001 from a quick test of CR 957766's effectiveness
18:17 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.28 refs T345889
18:15 brett: disabling puppet on apt1001 for a quick test of CR 957766's effectiveness
18:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
18:11 eevans@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts restbase2027.codfw.wmnet
18:08 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2026.codfw.wmnet with OS bullseye
18:07 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2027.codfw.wmnet
18:07 eevans@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts restbase2027.codfw.wmnet
18:05 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase2026.codfw.wmnet
18:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host restbase2026.codfw.wmnet
18:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host stat1011.eqiad.wmnet with OS bullseye
18:01 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2027.codfw.wmnet
17:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['stat1011']
17:53 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:53 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
17:53 bking@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:53 bking@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
17:53 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host restbase2026.codfw.wmnet
17:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1011']
17:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['stat1011']
17:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1011']
17:52 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2026.codfw.wmnet
17:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host stat1011.mgmt.eqiad.wmnet with reboot policy FORCED
17:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host stat1011.mgmt.eqiad.wmnet with reboot policy FORCED
17:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir6002.drmrs.wmnet with OS bookworm
17:39 jgreen@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:39 jgreen@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove host frauth2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
17:39 jgreen@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove host frauth2001.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1001"
17:38 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1019.eqiad.wmnet
17:38 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase1019.eqiad.wmnet
17:36 jgreen@cumin1001: START - Cookbook sre.dns.netbox
17:36 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bullseye
17:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir6002.drmrs.wmnet with reason: host reimage
17:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1019.eqiad.wmnet with OS bullseye
17:23 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir6002.drmrs.wmnet with reason: host reimage
17:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T343198)', diff saved to https://phabricator.wikimedia.org/P52696 and previous config saved to /var/cache/conftool/dbconfig/20230927-171014-arnaudb.json
17:10 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
17:10 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
17:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T343198)', diff saved to https://phabricator.wikimedia.org/P52695 and previous config saved to /var/cache/conftool/dbconfig/20230927-170953-arnaudb.json
17:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir6002.drmrs.wmnet with OS bookworm
16:58 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1019.eqiad.wmnet with reason: host reimage
16:55 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1019.eqiad.wmnet with reason: host reimage
16:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P52694 and previous config saved to /var/cache/conftool/dbconfig/20230927-165446-arnaudb.json
16:52 dduvall@deploy2002: Finished scap: (no justification provided) (duration: 28m 15s)
16:42 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1019.eqiad.wmnet with OS bullseye
16:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P52693 and previous config saved to /var/cache/conftool/dbconfig/20230927-163940-arnaudb.json
16:39 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1019.eqiad.wmnet']
16:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1019.eqiad.wmnet']
16:31 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1019.eqiad.wmnet with OS bullseye
16:29 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2023.codfw.wmnet
16:29 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2023.codfw.wmnet
16:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2023.codfw.wmnet with OS bullseye
16:24 dduvall@deploy2002: Started scap: (no justification provided)
16:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T343198)', diff saved to https://phabricator.wikimedia.org/P52692 and previous config saved to /var/cache/conftool/dbconfig/20230927-162433-arnaudb.json
16:16 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1019.eqiad.wmnet with OS bullseye
16:09 kamila_: Pooled back eqiad for traffic after the DC switchover (T345263)
16:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2023.codfw.wmnet with reason: host reimage
16:02 reedy@deploy2002: Finished scap: (no justification provided) (duration: 07m 22s)
16:00 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2023.codfw.wmnet with reason: host reimage
15:55 reedy@deploy2002: Started scap: (no justification provided)
15:54 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
15:54 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
15:53 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
15:53 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
15:53 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1031.eqiad.wmnet
15:53 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1031.eqiad.wmnet
15:51 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1025.eqiad.wmnet
15:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1025.eqiad.wmnet
15:51 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1022.eqiad.wmnet
15:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1022.eqiad.wmnet
15:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts restbase1019.eqiad.wmnet
15:50 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1019.eqiad.wmnet
15:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1018.eqiad.wmnet
15:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1018.eqiad.wmnet
15:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1017.eqiad.wmnet
15:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1017.eqiad.wmnet
15:49 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase1016.eqiad.wmnet
15:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase1016.eqiad.wmnet
15:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2023.codfw.wmnet with OS bullseye
15:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2018.codfw.wmnet
15:41 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2018.codfw.wmnet
15:30 dancy@deploy2002: Installation of scap version "4.63.0" completed for 598 hosts
15:29 dancy@deploy2002: Installing scap version "4.63.0" for 598 hosts
15:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2018.codfw.wmnet with OS bullseye
15:24 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
15:23 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcumin1001.eqiad.wmnet with OS bullseye
15:09 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcumin1001.eqiad.wmnet with reason: host reimage
15:09 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ntp.anycast.wmnet on all recursors
15:09 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache ntp.anycast.wmnet on all recursors
15:09 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:09 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
15:07 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:07 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp.anycast.wmnet - sukhe@cumin2002"
15:07 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp.anycast.wmnet - sukhe@cumin2002"
15:06 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcumin1001.eqiad.wmnet with reason: host reimage
15:04 sukhe@cumin2002: START - Cookbook sre.dns.netbox
15:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2018.codfw.wmnet with reason: host reimage
15:02 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:01 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
15:01 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:00 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:59 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:59 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:59 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
14:58 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2018.codfw.wmnet with reason: host reimage
14:58 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcumin1001.eqiad.wmnet with OS bullseye
14:57 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
14:56 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@49e3804]: Deploy latest Airflow DAGs to analytics instance (duration: 00m 42s)
14:55 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@49e3804]: Deploy latest Airflow DAGs to analytics instance
14:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
14:40 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2018.codfw.wmnet with OS bullseye
14:38 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2018.codfw.wmnet']
14:31 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2018.codfw.wmnet']
14:30 moritzm: Added Arnaud to pwstore and removed Jeff (frtech SREs no longer need/use it)
14:29 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all services in eqiad: Datacenter Switchover: eqiad repool - T345263
14:22 claime: Repooling eqiad services in progress - T345263
14:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
14:13 btullis@cumin1001: START - Cookbook sre.hosts.remove-downtime for 15 hosts
14:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2017.codfw.wmnet
14:10 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2017.codfw.wmnet
14:08 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcumin2001.codfw.wmnet with OS bullseye
14:08 kamila@cumin1001: START - Cookbook sre.discovery.datacenter pool all services in eqiad: Datacenter Switchover: eqiad repool - T345263
14:06 _joe_: updating conftool everywhere
14:00 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2017.codfw.wmnet with OS bullseye
13:54 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcumin2001.codfw.wmnet with reason: host reimage
13:51 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcumin2001.codfw.wmnet with reason: host reimage
13:50 Lucas_WMDE: UTC afternoon backport+config window done
13:49 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Add label for Wikifunctions in “other projects” sidebar section (T342857) (duration: 29m 56s)
13:46 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
13:44 aqu: Deployed refinery using scap, then deployed onto hdfs
13:43 aqu@deploy2002: Finished deploy [analytics/refinery@223be0f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@223be0fb] (duration: 08m 33s)
13:43 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
13:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
13:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
13:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Add label for Wikifunctions in “other projects” sidebar section (T342857) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:36 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcumin2001.codfw.wmnet with OS bullseye
13:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2017.codfw.wmnet with reason: host reimage
13:35 aqu@deploy2002: Started deploy [analytics/refinery@223be0f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@223be0fb]
13:33 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2017.codfw.wmnet with reason: host reimage
13:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
13:26 aqu@deploy2002: deploy aborted: Regular analytics weekly train TEST [analytics/refinery@223be0fb] (duration: 00m 16s)
13:26 aqu@deploy2002: Started deploy [analytics/refinery@223be0f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@223be0fb]
13:26 aqu@deploy2002: Finished deploy [analytics/refinery@223be0f] (thin): Regular analytics weekly train THIN [analytics/refinery@223be0fb] (duration: 00m 10s)
13:26 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
13:26 aqu@deploy2002: Started deploy [analytics/refinery@223be0f] (thin): Regular analytics weekly train THIN [analytics/refinery@223be0fb]
13:25 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
13:24 aqu@deploy2002: Finished deploy [analytics/refinery@223be0f]: Regular analytics weekly train [analytics/refinery@223be0fb] (duration: 06m 58s)
13:21 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
13:21 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
13:21 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
13:21 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
13:20 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
13:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
13:19 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Add label for Wikifunctions in “other projects” sidebar section (T342857)
13:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
13:17 aqu@deploy2002: Started deploy [analytics/refinery@223be0f]: Regular analytics weekly train [analytics/refinery@223be0fb]
13:17 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2017.codfw.wmnet with OS bullseye
13:12 aqu: Deployment weekly train of analytics-refinery (+new source version)
12:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: Still running on 9 mirrormaker processes from main-eqiad to jumbo
12:18 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: Still running on 9 mirrormaker processes from main-eqiad to jumbo
11:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T343198)', diff saved to https://phabricator.wikimedia.org/P52688 and previous config saved to /var/cache/conftool/dbconfig/20230927-112640-arnaudb.json
11:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
11:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T343198)', diff saved to https://phabricator.wikimedia.org/P52687 and previous config saved to /var/cache/conftool/dbconfig/20230927-112342-arnaudb.json
11:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
11:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
11:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52686 and previous config saved to /var/cache/conftool/dbconfig/20230927-112320-arnaudb.json
11:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P52685 and previous config saved to /var/cache/conftool/dbconfig/20230927-110813-arnaudb.json
10:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P52684 and previous config saved to /var/cache/conftool/dbconfig/20230927-105306-arnaudb.json
10:46 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:46 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:45 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:45 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:40 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:39 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:39 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52683 and previous config saved to /var/cache/conftool/dbconfig/20230927-103800-arnaudb.json
10:27 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:27 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
09:48 jayme@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes1013.*
09:43 claime: Bumping mw-on-k8s traffic to 8% - T346422
09:36 jayme: cordoning kubernetes1013 for debug porposes
09:33 taavi: update CR firewall policy, gerrit 961336
09:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
09:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
09:10 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:10 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
09:08 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:08 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
09:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo
09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo
08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
08:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
08:44 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
08:44 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
08:44 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
08:44 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo
08:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 15 hosts with reason: Kafka mirror issues on jumbo
08:21 vgutierrez: update HAProxy to version 2.7.10 in cp4051 - T317799
08:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 15 hosts with reason: Kafka mirror issues on jumbo
08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 15 hosts with reason: Kafka mirror issues on jumbo
07:39 Emperor: repool ms-fe2009
06:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
06:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on ldap-rw[1001,2001].wikimedia.org with reason: WIP
06:50 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
06:50 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
06:50 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
06:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
06:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
05:54 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
05:53 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
05:53 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
04:13 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
04:13 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
02:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1241.eqiad.wmnet with OS bullseye
02:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1240.eqiad.wmnet with OS bullseye
02:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1239.eqiad.wmnet with OS bullseye
02:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1236.eqiad.wmnet with OS bullseye
02:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1238.eqiad.wmnet with OS bullseye
02:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1237.eqiad.wmnet with OS bullseye
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1235.eqiad.wmnet with OS bullseye
02:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1234.eqiad.wmnet with OS bullseye
02:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1240.eqiad.wmnet with reason: host reimage
02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1241.eqiad.wmnet with reason: host reimage
02:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1239.eqiad.wmnet with reason: host reimage
02:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage
02:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1238.eqiad.wmnet with reason: host reimage
02:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1237.eqiad.wmnet with reason: host reimage
02:24 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1241.eqiad.wmnet with reason: host reimage
02:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1235.eqiad.wmnet with reason: host reimage
02:23 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1240.eqiad.wmnet with reason: host reimage
02:22 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1239.eqiad.wmnet with reason: host reimage
02:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1238.eqiad.wmnet with reason: host reimage
02:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1234.eqiad.wmnet with reason: host reimage
02:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1237.eqiad.wmnet with reason: host reimage
02:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1236.eqiad.wmnet with reason: host reimage
02:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1235.eqiad.wmnet with reason: host reimage
02:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1234.eqiad.wmnet with reason: host reimage
02:11 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1241.eqiad.wmnet with OS bullseye
02:11 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2025.codfw.wmnet
02:11 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2025.codfw.wmnet
02:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1240.eqiad.wmnet with OS bullseye
02:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2025.codfw.wmnet with OS bullseye
02:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1239.eqiad.wmnet with OS bullseye
02:09 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1238.eqiad.wmnet with OS bullseye
02:08 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1237.eqiad.wmnet with OS bullseye
02:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1236.eqiad.wmnet with OS bullseye
02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1235.eqiad.wmnet with OS bullseye
02:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1234.eqiad.wmnet with OS bullseye
02:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
02:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
02:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P52682 and previous config saved to /var/cache/conftool/dbconfig/20230927-020034-arnaudb.json
01:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2025.codfw.wmnet with reason: host reimage
01:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P52681 and previous config saved to /var/cache/conftool/dbconfig/20230927-014527-arnaudb.json
01:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2025.codfw.wmnet with reason: host reimage
01:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P52680 and previous config saved to /var/cache/conftool/dbconfig/20230927-013020-arnaudb.json
01:26 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2025.codfw.wmnet with OS bullseye
01:25 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2022.codfw.wmnet
01:25 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2022.codfw.wmnet
01:25 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2022.codfw.wmnet with OS bullseye
01:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P52679 and previous config saved to /var/cache/conftool/dbconfig/20230927-011514-arnaudb.json
01:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2022.codfw.wmnet with reason: host reimage
00:59 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2022.codfw.wmnet with reason: host reimage
00:42 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2022.codfw.wmnet with OS bullseye
00:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T343198)', diff saved to https://phabricator.wikimedia.org/P52678 and previous config saved to /var/cache/conftool/dbconfig/20230927-004144-arnaudb.json
00:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
00:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
00:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52677 and previous config saved to /var/cache/conftool/dbconfig/20230927-004122-arnaudb.json
00:30 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2020.codfw.wmnet with OS bullseye
00:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P52676 and previous config saved to /var/cache/conftool/dbconfig/20230927-002616-arnaudb.json
00:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P52675 and previous config saved to /var/cache/conftool/dbconfig/20230927-001109-arnaudb.json

2023-09-26

23:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52674 and previous config saved to /var/cache/conftool/dbconfig/20230926-235602-arnaudb.json
23:55 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2020.codfw.wmnet with reason: host reimage
23:52 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2020.codfw.wmnet with reason: host reimage
23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52673 and previous config saved to /var/cache/conftool/dbconfig/20230926-235026-arnaudb.json
23:50 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
23:50 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
23:50 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52672 and previous config saved to /var/cache/conftool/dbconfig/20230926-235005-arnaudb.json
23:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
23:41 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase2022.codfw.wmnet
23:41 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2022.codfw.wmnet
23:41 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts restbase2022.codfw.wmnet
23:41 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts restbase2022.codfw.wmnet
23:36 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2020.codfw.wmnet with OS bullseye
23:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P52671 and previous config saved to /var/cache/conftool/dbconfig/20230926-233458-arnaudb.json
23:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P52670 and previous config saved to /var/cache/conftool/dbconfig/20230926-231951-arnaudb.json
23:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52669 and previous config saved to /var/cache/conftool/dbconfig/20230926-230445-arnaudb.json
22:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
22:47 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2020.codfw.wmnet']
22:47 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2020.codfw.wmnet']
22:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:36 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2016.codfw.wmnet with OS bullseye
22:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004']
22:26 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004']
22:26 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004']
22:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004']
22:25 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004']
22:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004']
22:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T343198)', diff saved to https://phabricator.wikimedia.org/P52668 and previous config saved to /var/cache/conftool/dbconfig/20230926-220812-arnaudb.json
22:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
22:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
22:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P52667 and previous config saved to /var/cache/conftool/dbconfig/20230926-220801-arnaudb.json
21:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2016.codfw.wmnet with reason: host reimage
21:56 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2016.codfw.wmnet with reason: host reimage
21:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P52666 and previous config saved to /var/cache/conftool/dbconfig/20230926-215254-arnaudb.json
21:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS bullseye
21:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P52665 and previous config saved to /var/cache/conftool/dbconfig/20230926-213747-arnaudb.json
21:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:23 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P52664 and previous config saved to /var/cache/conftool/dbconfig/20230926-212240-arnaudb.json
21:22 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:13 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:13 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:13 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:09 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:08 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
21:00 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
20:59 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
20:59 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
20:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
20:50 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
20:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2016.codfw.wmnet']
20:49 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2016.codfw.wmnet']
20:48 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:48 taavi@deploy2002: Finished scap: Backport for Set WRITE_NEW for Wikitech on OATHAuth multiple devices migration (T242031) (duration: 07m 38s)
20:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T343198)', diff saved to https://phabricator.wikimedia.org/P52663 and previous config saved to /var/cache/conftool/dbconfig/20230926-204331-arnaudb.json
20:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
20:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
20:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P52662 and previous config saved to /var/cache/conftool/dbconfig/20230926-204309-arnaudb.json
20:42 taavi@deploy2002: taavi: Continuing with sync
20:42 taavi@deploy2002: taavi: Backport for Set WRITE_NEW for Wikitech on OATHAuth multiple devices migration (T242031) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:42 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2016.codfw.wmnet with OS bullseye
20:40 taavi@deploy2002: Started scap: Backport for Set WRITE_NEW for Wikitech on OATHAuth multiple devices migration (T242031)
20:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lists1004.eqiad.wmnet with OS bullseye
20:38 taavi@deploy2002: Finished scap: Backport for Do not set $wgPasswordResetRoutes['domain'] (T345226), Do not set $wgPasswordResetRoutes['domain'] (T345226) (duration: 08m 35s)
20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1016.eqiad.wmnet with OS bullseye
20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:31 taavi@deploy2002: taavi: Continuing with sync
20:31 taavi@deploy2002: taavi: Backport for Do not set $wgPasswordResetRoutes['domain'] (T345226), Do not set $wgPasswordResetRoutes['domain'] (T345226) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:29 taavi@deploy2002: Started scap: Backport for Do not set $wgPasswordResetRoutes['domain'] (T345226), Do not set $wgPasswordResetRoutes['domain'] (T345226)
20:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P52661 and previous config saved to /var/cache/conftool/dbconfig/20230926-202803-arnaudb.json
20:26 taavi@deploy2002: Finished scap: Backport for Add $wgExternalLinksDomainGaps (T341000) (duration: 09m 44s)
20:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:19 taavi@deploy2002: taavi and lucaswerkmeister: Continuing with sync
20:18 taavi@deploy2002: taavi and lucaswerkmeister: Backport for Add $wgExternalLinksDomainGaps (T341000) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:17 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1015.eqiad.wmnet with OS bullseye
20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:16 taavi@deploy2002: Started scap: Backport for Add $wgExternalLinksDomainGaps (T341000)
20:16 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:16 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:15 taavi@deploy2002: Finished scap: Backport for Wordmarks for Wikinews projects (T341258), Update wikiquote wordmarks (T341260), Update README clarifying the use of local images. (duration: 10m 04s)
20:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:15 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
20:14 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS bullseye
20:14 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase2016.codfw.wmnet with OS bullseye
20:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P52660 and previous config saved to /var/cache/conftool/dbconfig/20230926-201256-arnaudb.json
20:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1016.eqiad.wmnet with reason: host reimage
20:09 taavi@deploy2002: taavi and jdlrobson: Continuing with sync
20:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1016.eqiad.wmnet with reason: host reimage
20:06 taavi@deploy2002: taavi and jdlrobson: Backport for Wordmarks for Wikinews projects (T341258), Update wikiquote wordmarks (T341260), Update README clarifying the use of local images. synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-exp
20:06 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
20:06 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
20:05 taavi@deploy2002: Started scap: Backport for Wordmarks for Wikinews projects (T341258), Update wikiquote wordmarks (T341260), Update README clarifying the use of local images.
20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
20:04 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
20:04 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
20:04 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
20:02 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
20:02 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
20:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1015.eqiad.wmnet with reason: host reimage
20:02 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
20:01 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
20:01 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
20:00 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
20:00 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
20:00 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:59 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:59 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1015.eqiad.wmnet with reason: host reimage
19:57 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
19:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P52659 and previous config saved to /var/cache/conftool/dbconfig/20230926-195750-arnaudb.json
19:57 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
19:55 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
19:54 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
19:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host lists1004.eqiad.wmnet with OS bullseye
19:53 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
19:52 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
19:48 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2016.codfw.wmnet with OS bullseye
19:47 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
19:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2015.codfw.wmnet
19:47 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2015.codfw.wmnet
19:46 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
19:46 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
19:46 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
19:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2015.codfw.wmnet with OS bullseye
19:45 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
19:42 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
19:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1023.eqiad.wmnet with OS bullseye
19:40 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
19:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
19:37 joal@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
19:37 joal@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
19:33 joal@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
19:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc1015.eqiad.wmnet with OS bullseye
19:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc1016.eqiad.wmnet with OS bullseye
19:33 joal@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
19:32 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
19:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
19:31 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
19:30 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
19:27 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:27 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1023.eqiad.wmnet with reason: host reimage
19:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T343198)', diff saved to https://phabricator.wikimedia.org/P52657 and previous config saved to /var/cache/conftool/dbconfig/20230926-191904-arnaudb.json
19:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
19:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
19:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52656 and previous config saved to /var/cache/conftool/dbconfig/20230926-191843-arnaudb.json
19:18 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1023.eqiad.wmnet with reason: host reimage
19:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:16 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:05 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2015.codfw.wmnet with reason: host reimage
19:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P52655 and previous config saved to /var/cache/conftool/dbconfig/20230926-190336-arnaudb.json
19:02 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1007.eqiad.wmnet with OS bullseye
19:02 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
19:02 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2015.codfw.wmnet with reason: host reimage
18:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
18:58 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1007.eqiad.wmnet with OS bullseye
18:54 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
18:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1020.eqiad.wmnet with OS bullseye
18:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:48 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
18:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P52654 and previous config saved to /var/cache/conftool/dbconfig/20230926-184830-arnaudb.json
18:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:47 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2015.codfw.wmnet with OS bullseye
18:47 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2024.codfw.wmnet
18:46 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2024.codfw.wmnet
18:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1017.eqiad.wmnet with OS bullseye
18:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:45 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:41 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:40 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
18:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52653 and previous config saved to /var/cache/conftool/dbconfig/20230926-183323-arnaudb.json
18:32 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
18:30 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.28 refs T345889
18:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1017.eqiad.wmnet with reason: host reimage
18:28 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1020.eqiad.wmnet with reason: host reimage
18:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1017.eqiad.wmnet with reason: host reimage
18:24 bking@cumin1001: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
18:18 brennen: train 1.41.0-wmf.28 (T345889): no current blockers, rolling to group0
18:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2024.codfw.wmnet with OS bullseye
18:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
18:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1020']
18:04 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
18:03 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1017']
18:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1020']
18:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017']
18:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1021.eqiad.wmnet with OS bullseye
18:01 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
17:58 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:58 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:52 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@94ac23e]: tune parallelism of process_sparql_query_hourly (duration: 00m 27s)
17:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T343198)', diff saved to https://phabricator.wikimedia.org/P52652 and previous config saved to /var/cache/conftool/dbconfig/20230926-175222-arnaudb.json
17:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
17:52 ebernhardson@deploy2002: Started deploy [airflow-dags/search@94ac23e]: tune parallelism of process_sparql_query_hourly
17:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
17:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P52651 and previous config saved to /var/cache/conftool/dbconfig/20230926-175201-arnaudb.json
17:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2024.codfw.wmnet with reason: host reimage
17:43 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2024.codfw.wmnet with reason: host reimage
17:36 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P52650 and previous config saved to /var/cache/conftool/dbconfig/20230926-173653-arnaudb.json
17:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2024.codfw.wmnet with OS bullseye
17:21 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P52649 and previous config saved to /var/cache/conftool/dbconfig/20230926-172146-arnaudb.json
17:15 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:15 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pyrra.svc records - herron@cumin1001"
17:14 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pyrra.svc records - herron@cumin1001"
17:12 herron@cumin1001: START - Cookbook sre.dns.netbox
17:06 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P52648 and previous config saved to /var/cache/conftool/dbconfig/20230926-170639-arnaudb.json
17:01 bblack: A:swift-fe-codfw: manually rolling systemctl restart of swift-proxy and nginx
16:59 bblack@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
16:53 bblack@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
16:52 bblack: ms-fe2009 - restart swift_dispersion_stats + swift_dispersion_stats_lowlatency services (failing in systemctl)
16:51 bblack@cumin1001: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=1) rolling restart_daemons on A:swift-fe-codfw
16:45 bblack@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
16:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1020.eqiad.wmnet with OS bullseye
16:28 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:27 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
16:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T343198)', diff saved to https://phabricator.wikimedia.org/P52647 and previous config saved to /var/cache/conftool/dbconfig/20230926-162609-arnaudb.json
16:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
16:25 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
16:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52646 and previous config saved to /var/cache/conftool/dbconfig/20230926-162547-arnaudb.json
16:23 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
16:23 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
16:17 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
16:17 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
16:15 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
16:15 aokoth@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
16:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1021.eqiad.wmnet with reason: host reimage
16:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P52645 and previous config saved to /var/cache/conftool/dbconfig/20230926-161041-arnaudb.json
16:09 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1021.eqiad.wmnet with reason: host reimage
16:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1021.eqiad.wmnet with OS bullseye
15:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
15:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P52644 and previous config saved to /var/cache/conftool/dbconfig/20230926-155534-arnaudb.json
15:47 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
15:47 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1021']
15:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021']
15:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1021']
15:46 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021']
15:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
15:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52643 and previous config saved to /var/cache/conftool/dbconfig/20230926-154027-arnaudb.json
15:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
15:37 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
15:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
15:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
15:25 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
15:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:25 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
15:24 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1020.eqiad.wmnet with OS bullseye
15:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
15:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2021.codfw.wmnet
15:24 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2021.codfw.wmnet
15:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
15:11 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
15:11 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:09 jclark@cumin1001: START - Cookbook sre.dns.netbox
15:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
15:07 brennen@deploy2002: Finished deploy [phabricator/deployment@d895dde]: deploy to phab1004 for weekly updates (duration: 00m 44s)
15:06 brennen@deploy2002: Started deploy [phabricator/deployment@d895dde]: deploy to phab1004 for weekly updates
15:06 brennen@deploy2002: Finished deploy [phabricator/deployment@d895dde]: test deploy to phab2002 (duration: 00m 35s)
15:05 brennen@deploy2002: Started deploy [phabricator/deployment@d895dde]: test deploy to phab2002
15:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:04 ejegg: re-enabled recurring donations charge job
15:03 brennen: beginning routine phabricator update shortly
15:03 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
15:03 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator maintenance
15:02 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1020.mgmt.eqiad.wmnet with reboot policy FORCED
15:01 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
15:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:01 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt 20 - jclark@cumin1001"
15:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T343198)', diff saved to https://phabricator.wikimedia.org/P52642 and previous config saved to /var/cache/conftool/dbconfig/20230926-150056-arnaudb.json
15:01 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
15:01 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
15:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
15:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
15:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T343198)', diff saved to https://phabricator.wikimedia.org/P52641 and previous config saved to /var/cache/conftool/dbconfig/20230926-150028-arnaudb.json
15:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt 20 - jclark@cumin1001"
14:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
14:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
14:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
14:57 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:53 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:53 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
14:52 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
14:50 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
14:47 moritzm: installing lldpd security updates
14:46 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2021.codfw.wmnet with OS bullseye
14:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P52640 and previous config saved to /var/cache/conftool/dbconfig/20230926-144521-arnaudb.json
14:38 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
14:38 effie: Rump up traffic to mw-on-k8s to 6.5% - T346422
14:37 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
14:36 ejegg: fundraising civicrm upgraded from 9efea665 to 41a4c2cf
14:35 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
14:34 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "puppetserver2002.codfw.wmnet - jbond@cumin2002"
14:33 ejegg: disabled recurring donations charge job for civi deploy
14:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P52639 and previous config saved to /var/cache/conftool/dbconfig/20230926-143015-arnaudb.json
14:27 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "puppetserver2002.codfw.wmnet - jbond@cumin2002"
14:25 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2002.codfw.wmnet with OS bookworm
14:25 jbond@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
14:24 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
14:23 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver1003.eqiad.wmnet with OS bookworm
14:23 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
14:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
14:21 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
14:17 moritzm: prune obsolete nginx packages from durum hosts after migration to new library scheme T329529
14:16 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
14:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T343198)', diff saved to https://phabricator.wikimedia.org/P52638 and previous config saved to /var/cache/conftool/dbconfig/20230926-141508-arnaudb.json
14:13 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2002.codfw.wmnet with reason: host reimage
14:10 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2002.codfw.wmnet with reason: host reimage
14:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
14:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
14:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1003.eqiad.wmnet with reason: host reimage
14:02 Lucas_WMDE: UTC afternoon backport+config window done
14:02 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Enable Minerva site notice for wikifunctions wiki (T345463) (duration: 09m 51s)
14:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1003.eqiad.wmnet with reason: host reimage
14:01 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2021.codfw.wmnet with reason: host reimage
13:57 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2021.codfw.wmnet with reason: host reimage
13:55 lucaswerkmeister-wmde@deploy2002: ammarpad and lucaswerkmeister-wmde: Continuing with sync
13:54 lucaswerkmeister-wmde@deploy2002: ammarpad and lucaswerkmeister-wmde: Backport for Enable Minerva site notice for wikifunctions wiki (T345463) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:52 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Enable Minerva site notice for wikifunctions wiki (T345463)
13:51 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for arwikisource: Increase autoconfirm edit count to 10 (T347264) (duration: 11m 27s)
13:47 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 ammarpad and lucaswerkmeister-wmde: Continuing with sync [originally 13:44 UTC]
13:43 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2021.codfw.wmnet with OS bullseye
13:43 lucaswerkmeister-wmde@deploy2002: ammarpad and lucaswerkmeister-wmde: Backport for arwikisource: Increase autoconfirm edit count to 10 (T347264) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:43 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2019.codfw.wmnet
13:43 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2019.codfw.wmnet
13:39 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for arwikisource: Increase autoconfirm edit count to 10 (T347264)
13:37 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for add search update pipeline streams (update + fetch_error) (T317609) (duration: 11m 54s)
13:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
13:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
13:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
13:35 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
13:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
13:34 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
13:31 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2019.codfw.wmnet with OS bullseye
13:31 lucaswerkmeister-wmde@deploy2002: pfischer and lucaswerkmeister-wmde: Continuing with sync
13:29 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1003.eqiad.wmnet with OS bookworm
13:27 lucaswerkmeister-wmde@deploy2002: pfischer and lucaswerkmeister-wmde: Backport for add search update pipeline streams (update + fetch_error) (T317609) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:25 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for add search update pipeline streams (update + fetch_error) (T317609)
13:25 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2002.codfw.wmnet with OS bookworm
13:25 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver2002.codfw.wmnet on all recursors
13:25 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetserver2002.codfw.wmnet on all recursors
13:25 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:25 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
13:24 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
13:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T343198)', diff saved to https://phabricator.wikimedia.org/P52637 and previous config saved to /var/cache/conftool/dbconfig/20230926-132357-arnaudb.json
13:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
13:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
13:22 jbond@cumin1001: START - Cookbook sre.dns.netbox
13:21 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Make wikifunctionswiki a multilingual Wikidata client (T342857) (duration: 09m 44s)
13:18 aokoth@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
13:15 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
13:15 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
13:14 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
13:13 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Make wikifunctionswiki a multilingual Wikidata client (T342857) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:11 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Make wikifunctionswiki a multilingual Wikidata client (T342857)
13:07 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
13:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2019.codfw.wmnet with reason: host reimage
13:06 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
13:04 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
13:04 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2019.codfw.wmnet with reason: host reimage
13:04 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
13:02 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
13:02 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
13:01 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
13:01 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
13:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
13:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
13:00 aokoth@cumin1001: END (FAIL) - Cookbook sre.gitlab.failover (exit_code=93) Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
13:00 jbond@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
12:57 jbond@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
12:55 jbond@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver1003
12:54 aokoth@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab2002.wikimedia.org to gitlab1003.wikimedia.org
12:53 jbond@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1003
12:53 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
12:53 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
12:52 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:52 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
12:52 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2002
12:52 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2002
12:52 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster[12]004 - jbond@cumin1001"
12:49 jbond@cumin1001: START - Cookbook sre.dns.netbox
12:48 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2019.codfw.wmnet with OS bullseye
12:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
12:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
12:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
12:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
12:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM puppetdb2002.codfw.wmnet
12:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster1004.eqiad.wmnet
12:16 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:16 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
12:15 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
12:12 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM puppetdb2002.codfw.wmnet
12:12 jbond@cumin1001: START - Cookbook sre.dns.netbox
12:10 taavi: deploy https://gerrit.wikimedia.org/r/961054 via homer
12:10 jbond@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts puppetmaster2004.codfw.wmnet
12:10 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
12:08 jbond@cumin2002: START - Cookbook sre.dns.netbox
12:05 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetmaster1004.eqiad.wmnet
12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52635 and previous config saved to /var/cache/conftool/dbconfig/20230926-120417-arnaudb.json
12:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
12:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
12:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T343198)', diff saved to https://phabricator.wikimedia.org/P52634 and previous config saved to /var/cache/conftool/dbconfig/20230926-120355-arnaudb.json
12:00 jbond@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster2004.codfw.wmnet
11:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P52633 and previous config saved to /var/cache/conftool/dbconfig/20230926-114848-arnaudb.json
11:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P52632 and previous config saved to /var/cache/conftool/dbconfig/20230926-113340-arnaudb.json
11:29 taavi@deploy2002: Finished scap: Backport for wikitech: $wgPasswordResetRoutes takes an empty array, not false (duration: 07m 28s)
11:23 taavi@deploy2002: taavi: Continuing with sync
11:23 taavi@deploy2002: taavi: Backport for wikitech: $wgPasswordResetRoutes takes an empty array, not false synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
11:21 taavi@deploy2002: Started scap: Backport for wikitech: $wgPasswordResetRoutes takes an empty array, not false
11:18 taavi@deploy2002: Finished scap: Backport for wikitech: Properly disable password resets (T345226) (duration: 08m 00s)
11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T343198)', diff saved to https://phabricator.wikimedia.org/P52631 and previous config saved to /var/cache/conftool/dbconfig/20230926-111834-arnaudb.json
11:12 taavi@deploy2002: taavi: Continuing with sync
11:12 taavi@deploy2002: taavi: Backport for wikitech: Properly disable password resets (T345226) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
11:10 taavi@deploy2002: Started scap: Backport for wikitech: Properly disable password resets (T345226)
11:07 joal@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
11:07 joal@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
10:55 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
10:55 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
10:54 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
10:53 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
10:51 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
10:51 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
10:46 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
10:46 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
10:46 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
10:46 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
10:41 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:41 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
10:40 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
10:40 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
10:39 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:38 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:38 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:38 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:37 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:37 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Downtiming host for RAID controller battery replacement
10:37 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-worker1086.eqiad.wmnet with reason: Downtiming host for RAID controller battery replacement
10:36 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:35 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:05 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
10:05 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
10:04 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
10:04 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
10:04 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
10:03 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
10:03 taavi: update CR firewall policy to permit wiki replica account creation in the new cloud-private network setup, https://gerrit.wikimedia.org/r/961055 T347381
10:03 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
10:02 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
10:01 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
10:00 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
10:00 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
10:00 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
09:54 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
09:53 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
09:52 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
09:52 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
09:48 godog: remove per-host restbase healthchecks, replaced by service-level swagger-exporter checks - T314118
09:47 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
09:47 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
09:38 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
09:38 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
09:37 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
09:36 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
09:36 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
09:35 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
09:35 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
09:35 claime: Raised replicas to 20 for mw-api-ext and mw-web - T346422
09:35 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
09:34 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
09:34 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
09:34 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
09:34 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
09:33 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
09:33 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
09:30 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
09:29 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
09:29 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
09:28 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
09:27 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
09:26 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
09:25 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
09:23 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:23 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
09:22 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:22 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
09:22 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:21 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
09:20 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
09:20 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
09:19 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
09:19 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
09:18 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
09:17 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
09:17 oblivian@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
09:16 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
09:16 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
09:15 oblivian@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
09:15 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
09:15 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
09:15 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
09:15 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
09:14 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
09:14 jiji@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
09:13 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
09:13 oblivian@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
09:13 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
09:12 oblivian@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
09:09 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
09:08 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
08:52 taavi@deploy2002: taavi: Continuing with sync
08:52 taavi@deploy2002: taavi: Backport for wikitech: Disable password resets (T345226), wikitech: Block account creation by sysops too (T345226) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
08:51 taavi@deploy2002: Started scap: Backport for wikitech: Disable password resets (T345226), wikitech: Block account creation by sysops too (T345226)
08:03 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1007.eqiad.wmnet
07:56 taavi@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.eqiad.wmnet
07:55 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.eqiad.wmnet with OS bullseye
07:55 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
07:54 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1001"
07:45 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1007 - taavi@cumin1001"
07:44 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1007 - taavi@cumin1001"
07:25 taavi@deploy2002: Finished scap: Backport for Enable wgMinervaEnableSiteNotice for knwiki (T346582), guwikisource: add audiobook namespace (T347189), add throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T346043) (duration: 11m 41s)
07:18 taavi@deploy2002: anzx and taavi: Continuing with sync
07:15 taavi@deploy2002: anzx and taavi: Backport for Enable wgMinervaEnableSiteNotice for knwiki (T346582), guwikisource: add audiobook namespace (T347189), add throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T346043) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug k
07:13 taavi@deploy2002: Started scap: Backport for Enable wgMinervaEnableSiteNotice for knwiki (T346582), guwikisource: add audiobook namespace (T347189), add throttle rule for UIUC Wikipedia edit-a-thon October 13, 2023 (T346043)
07:08 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
07:05 taavi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
06:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
06:57 isaranto@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
06:56 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
06:42 taavi@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.eqiad.wmnet with OS bullseye
04:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1229.eqiad.wmnet with OS bullseye
03:54 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.26 (duration: 02m 13s)
03:52 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.28 refs T345889 (duration: 49m 31s)
03:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.28 refs T345889
02:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1232.eqiad.wmnet with OS bullseye
02:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1231.eqiad.wmnet with OS bullseye
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1233.eqiad.wmnet with OS bullseye
02:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1230.eqiad.wmnet with OS bullseye
02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1226.eqiad.wmnet with OS bullseye
02:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1232.eqiad.wmnet with reason: host reimage
02:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage
02:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
02:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1230.eqiad.wmnet with reason: host reimage
02:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1233.eqiad.wmnet with reason: host reimage
02:19 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1232.eqiad.wmnet with reason: host reimage
02:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: host reimage
02:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1230.eqiad.wmnet with reason: host reimage
02:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
02:11 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1226.eqiad.wmnet with reason: host reimage
02:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1233.eqiad.wmnet with OS bullseye
02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1232.eqiad.wmnet with OS bullseye
02:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1231.eqiad.wmnet with OS bullseye
02:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1230.eqiad.wmnet with OS bullseye
02:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1229.eqiad.wmnet with OS bullseye
02:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1228.eqiad.wmnet with OS bullseye
02:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS bullseye
01:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db1226.eqiad.wmnet with OS bullseye
food: payments-wiki upgraded from 5596c7fd to 358e616e
01:21 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
01:20 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
01:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
01:19 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
01:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
01:19 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
01:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
01:18 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
01:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T343198)', diff saved to https://phabricator.wikimedia.org/P52628 and previous config saved to /var/cache/conftool/dbconfig/20230926-011707-arnaudb.json
01:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
01:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
01:16 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
01:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
01:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T343198)', diff saved to https://phabricator.wikimedia.org/P52627 and previous config saved to /var/cache/conftool/dbconfig/20230926-011629-arnaudb.json
01:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P52626 and previous config saved to /var/cache/conftool/dbconfig/20230926-010123-arnaudb.json
00:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P52625 and previous config saved to /var/cache/conftool/dbconfig/20230926-004616-arnaudb.json
00:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
00:40 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
00:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T343198)', diff saved to https://phabricator.wikimedia.org/P52624 and previous config saved to /var/cache/conftool/dbconfig/20230926-003109-arnaudb.json
00:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1024.eqiad.wmnet with OS bullseye
00:30 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:29 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1022.eqiad.wmnet with OS bullseye
00:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1019.eqiad.wmnet with OS bullseye
00:26 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:25 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:24 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1018.eqiad.wmnet with OS bullseye
00:24 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
00:09 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1022.eqiad.wmnet with reason: host reimage
00:07 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage
00:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1024.eqiad.wmnet with reason: host reimage
00:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1019.eqiad.wmnet with reason: host reimage
00:04 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1018.eqiad.wmnet with reason: host reimage

2023-09-25

23:55 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
23:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
23:50 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1022.eqiad.wmnet with OS bullseye
23:48 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1022']
23:45 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:45 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
23:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1024.eqiad.wmnet with OS bullseye
23:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1019.eqiad.wmnet with OS bullseye
23:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
23:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1019']
23:44 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018']
23:43 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
23:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:43 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
23:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1024']
23:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:42 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:42 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1021']
23:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1017']
23:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:40 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:37 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:37 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:37 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1019']
23:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018']
23:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:36 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:35 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023']
23:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1019']
23:35 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1018']
23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1024']
23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023']
23:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021']
23:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1019']
23:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018']
23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1024.eqiad.wmnet with OS bullseye
23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1023.eqiad.wmnet with OS bullseye
23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1022.eqiad.wmnet with OS bullseye
23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1021.eqiad.wmnet with OS bullseye
23:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017']
23:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1017']
23:33 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017']
23:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1019.eqiad.wmnet with OS bullseye
23:31 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1018.eqiad.wmnet with OS bullseye
23:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
22:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1024.eqiad.wmnet with OS bullseye
22:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1023.eqiad.wmnet with OS bullseye
22:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1022.eqiad.wmnet with OS bullseye
22:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1021.eqiad.wmnet with OS bullseye
22:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1019.eqiad.wmnet with OS bullseye
22:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1018.eqiad.wmnet with OS bullseye
22:11 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1019.eqiad.wmnet']
22:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1023.eqiad.wmnet']
22:08 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1021.eqiad.wmnet']
22:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs1024.eqiad.wmnet']
22:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022']
22:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
22:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022']
22:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
22:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022']
22:04 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022']
22:03 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1019.eqiad.wmnet']
22:01 dancy@deploy2002: Finished scap: final test sync (duration: 15m 00s)
21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
21:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
21:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
21:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1018.eqiad.wmnet']
21:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1023.eqiad.wmnet']
21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1024.eqiad.wmnet']
21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1022.eqiad.wmnet']
21:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1021.eqiad.wmnet']
21:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
21:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1022.mgmt.eqiad.wmnet with reboot policy FORCED
21:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
21:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1019.mgmt.eqiad.wmnet with reboot policy FORCED
21:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
21:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
21:46 dancy@deploy2002: Started scap: final test sync
21:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1018.mgmt.eqiad.wmnet with reboot policy FORCED
21:45 dancy@deploy2002: Started scap: testing scap mods
21:38 dancy@deploy2002: Started scap: testing scap mods
21:37 dancy@deploy2002: Installation of scap version "4.62.0" completed for 598 hosts
21:36 dancy@deploy2002: Installing scap version "4.62.0" for 598 hosts
21:32 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
21:32 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
21:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1021.mgmt.eqiad.wmnet with reboot policy FORCED
21:31 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
21:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1019.mgmt.eqiad.wmnet with reboot policy FORCED
21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1018.mgmt.eqiad.wmnet with reboot policy FORCED
21:30 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:30 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1024.mgmt.eqiad.wmnet with reboot policy FORCED
21:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1023.mgmt.eqiad.wmnet with reboot policy FORCED
21:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1017.mgmt.eqiad.wmnet with reboot policy FORCED
21:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host wdqs1022.mgmt.eqiad.wmnet with reboot policy FORCED
21:29 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
21:27 jclark@cumin1001: START - Cookbook sre.dns.netbox
21:22 dancy@deploy2002: Started scap: testing scap mods
21:20 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:20 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
21:19 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt wdqs1017-20 - jclark@cumin1001"
21:17 jclark@cumin1001: START - Cookbook sre.dns.netbox
21:12 cjming: end of UTC late backport window
21:02 cjming@deploy2002: Finished scap: Backport for Provide wordmarks/taglines for Wikibooks projects (T341251), Fix white background for Wikibooks wordmarks (T341251), Icons for special projects (T341242) (duration: 23m 50s)
20:53 cjming@deploy2002: pikne and cjming and jdlrobson: Continuing with sync
20:51 cjming@deploy2002: pikne and cjming and jdlrobson: Backport for Provide wordmarks/taglines for Wikibooks projects (T341251), Fix white background for Wikibooks wordmarks (T341251), Icons for special projects (T341242) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes
20:39 cjming@deploy2002: Started scap: Backport for Provide wordmarks/taglines for Wikibooks projects (T341251), Fix white background for Wikibooks wordmarks (T341251), Icons for special projects (T341242)
20:25 cjming@deploy2002: Finished scap: Backport for Deploy Reader Demographics 2 pilot survey (T345951) (duration: 21m 18s)
20:16 cjming@deploy2002: cjming and dani: Continuing with sync
20:15 cjming@deploy2002: cjming and dani: Backport for Deploy Reader Demographics 2 pilot survey (T345951) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:03 cjming@deploy2002: Started scap: Backport for Deploy Reader Demographics 2 pilot survey (T345951)
18:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
18:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2109.codfw.wmnet with reason: Host crashed
18:36 ejegg: Standalone (payments listener) SmashPig upgraded from 0703ce60 to a78a91d9
16:54 jayme@cumin1001: conftool action : set/pooled=yes; selector: name=kubernetes2010.codfw.wmnet
16:51 jayme: uncordon kubernetes2010.codfw.wmnet - T347267
16:11 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
16:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
16:09 sukhe@cumin2002: dbctl commit (dc=all): 'Depool db2109', diff saved to https://phabricator.wikimedia.org/P52622 and previous config saved to /var/cache/conftool/dbconfig/20230925-160904-sukhe.json
16:01 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
15:57 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
15:55 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
15:54 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
15:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1034.eqiad.wmnet with reason: Maintenance
15:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
15:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
15:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
15:30 ejegg: Standalone (payments listener) SmashPig upgraded from 2412df22 to 0703ce60
15:23 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:23 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new records for cloudcontrol1007 - cmooney@cumin1001"
15:23 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new records for cloudcontrol1007 - cmooney@cumin1001"
15:22 taavi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1007
15:21 taavi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1007
15:21 herron: alert[12]001 -- rm /etc/apache2/sites-available/50-dispatch-wikimedia-org.conf && apachectl graceful T344937
15:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P52621 and previous config saved to /var/cache/conftool/dbconfig/20230925-152043-ladsgroup.json
15:19 herron: alert[12]001 -- apt remove docker.io T344937
15:17 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:17 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1007 - taavi@cumin1001"
15:16 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign new IPs to cloudcontrol1007 - taavi@cumin1001"
15:14 taavi@cumin1001: START - Cookbook sre.dns.netbox
15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P52620 and previous config saved to /var/cache/conftool/dbconfig/20230925-150536-ladsgroup.json
15:00 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:57 moritzm: installing python3.7 security updates
14:53 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P52619 and previous config saved to /var/cache/conftool/dbconfig/20230925-145029-ladsgroup.json
14:46 filippo@deploy2002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
14:46 filippo@deploy2002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
14:45 filippo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
14:45 filippo@deploy2002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
14:43 herron@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet
14:43 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:43 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
14:39 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:39 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:38 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:38 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:37 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:36 jayme@deploy2002: Finished scap: (no justification provided) (duration: 03m 09s)
14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P52618 and previous config saved to /var/cache/conftool/dbconfig/20230925-143523-ladsgroup.json
14:35 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:34 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:34 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:33 jayme@deploy2002: Started scap: (no justification provided)
14:33 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:32 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:32 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:31 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - herron@cumin1001"
14:31 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:31 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:31 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:30 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:29 herron@cumin1001: START - Cookbook sre.dns.netbox
14:29 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:28 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:25 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:25 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:24 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:24 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:24 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:24 herron@cumin1001: START - Cookbook sre.hosts.decommission for hosts dispatch-be2001.codfw.wmnet,dispatch-be1001.eqiad.wmnet
14:22 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:22 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:22 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:21 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:20 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:20 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:19 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:19 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:18 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1025 (T344589)', diff saved to https://phabricator.wikimedia.org/P52615 and previous config saved to /var/cache/conftool/dbconfig/20230925-141313-ladsgroup.json
14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
14:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T343198)', diff saved to https://phabricator.wikimedia.org/P52614 and previous config saved to /var/cache/conftool/dbconfig/20230925-141252-arnaudb.json
14:12 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
14:12 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
14:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T343198)', diff saved to https://phabricator.wikimedia.org/P52613 and previous config saved to /var/cache/conftool/dbconfig/20230925-141230-arnaudb.json
14:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
14:04 urbanecm@deploy2002: Finished scap: Backport for listTaskCounts: Do not expect tasks key to be present (T347120), AddImageFeedbackHandler: Add missing parameters (T346277), Enable Parsoid support for Kartographer on enwikivoyage (T342871) (duration: 38m 35s)
14:00 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
13:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59278
13:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
13:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P52612 and previous config saved to /var/cache/conftool/dbconfig/20230925-135724-arnaudb.json
13:51 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P52611 and previous config saved to /var/cache/conftool/dbconfig/20230925-135004-ladsgroup.json
13:43 urbanecm@deploy2002: urbanecm and ihurbain: Continuing with sync
13:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P52610 and previous config saved to /var/cache/conftool/dbconfig/20230925-134217-arnaudb.json
13:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
13:38 urbanecm@deploy2002: urbanecm and ihurbain: Backport for listTaskCounts: Do not expect tasks key to be present (T347120), AddImageFeedbackHandler: Add missing parameters (T346277), Enable Parsoid support for Kartographer on enwikivoyage (T342871) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.w
13:36 jayme@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.codfw.wmnet
13:36 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,name=kubernetes.*
13:35 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
13:35 jayme@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,name=kubernetes.*
13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P52607 and previous config saved to /var/cache/conftool/dbconfig/20230925-133457-ladsgroup.json
13:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1009.eqiad.wmnet
13:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T343198)', diff saved to https://phabricator.wikimedia.org/P52606 and previous config saved to /var/cache/conftool/dbconfig/20230925-132711-arnaudb.json
13:26 urbanecm@deploy2002: Started scap: Backport for listTaskCounts: Do not expect tasks key to be present (T347120), AddImageFeedbackHandler: Add missing parameters (T346277), Enable Parsoid support for Kartographer on enwikivoyage (T342871)
13:25 urbanecm@deploy2002: Finished scap: Backport for GrowthExperiments: enable AddLink backend 14th round of wikis (T308139) (duration: 23m 28s)
13:22 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1009.eqiad.wmnet
13:22 jayme@cumin1001: conftool action : set/weight=10; selector: service=kubesvc,cluster=kubernetes,dc=codfw
13:21 jayme@cumin1001: conftool action : set/weight=10; selector: service=kubesvc,cluster=kubernetes,dc=eqiad
13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P52605 and previous config saved to /var/cache/conftool/dbconfig/20230925-131951-ladsgroup.json
13:17 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1010.eqiad.wmnet
13:15 urbanecm@deploy2002: urbanecm and sgimeno: Continuing with sync
13:14 jayme: ran homer "lsw1-*eqiad*" commit - T346714
13:14 urbanecm@deploy2002: urbanecm and sgimeno: Backport for GrowthExperiments: enable AddLink backend 14th round of wikis (T308139) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:13 jayme: uncordoned kubernetes10[27-56]
13:11 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1010.eqiad.wmnet
13:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint1002.eqiad.wmnet
13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P52604 and previous config saved to /var/cache/conftool/dbconfig/20230925-130444-ladsgroup.json
13:04 moritzm: installing openjdk-11 security updates on buster
13:03 jayme: cordoned kubernetes10[27-56]
13:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59278
13:01 urbanecm@deploy2002: Started scap: Backport for GrowthExperiments: enable AddLink backend 14th round of wikis (T308139)
13:01 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint1002.eqiad.wmnet
12:56 kamila_: put codfw before eqiad in geoDNS defaults
12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1024 (T344589)', diff saved to https://phabricator.wikimedia.org/P52603 and previous config saved to /var/cache/conftool/dbconfig/20230925-125212-ladsgroup.json
12:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
12:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1024-1025].eqiad.wmnet with reason: Maintenance
12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es[1024-1025].eqiad.wmnet with reason: Maintenance
12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1023.eqiad.wmnet with reason: Maintenance
12:26 jayme@deploy2002: Finished scap: (no justification provided) (duration: 10m 08s)
12:17 jayme: bumping k8s deployment mw-web and mw-api-ext to 16 replicas each in both DCs
12:16 jayme@deploy2002: Started scap: (no justification provided)
11:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
11:43 jayme: running puppet on lvs in eqiad - T346714 (TYPO from above, did not run in codfw)
11:42 jayme: running puppet on lvs in codfw - T346714
11:37 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
11:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1054.eqiad.wmnet with OS bullseye
11:20 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1013.eqiad.wmnet
11:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1014.eqiad.wmnet
11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1036.eqiad.wmnet with OS bullseye
11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1053.eqiad.wmnet with OS bullseye
11:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1047.eqiad.wmnet with OS bullseye
11:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host rdb1014.eqiad.wmnet
11:13 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1038.eqiad.wmnet with OS bullseye
11:12 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1050.eqiad.wmnet with OS bullseye
11:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1046.eqiad.wmnet with OS bullseye
11:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
11:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1049.eqiad.wmnet with OS bullseye
11:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1044.eqiad.wmnet with OS bullseye
11:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1048.eqiad.wmnet with OS bullseye
11:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T344589)', diff saved to https://phabricator.wikimedia.org/P52602 and previous config saved to /var/cache/conftool/dbconfig/20230925-110343-ladsgroup.json
11:03 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1056.eqiad.wmnet with OS bullseye
11:00 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
10:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1043.eqiad.wmnet with OS bullseye
10:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
10:58 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1045.eqiad.wmnet with OS bullseye
10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1055.eqiad.wmnet with OS bullseye
10:57 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
10:57 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1051.eqiad.wmnet with OS bullseye
10:54 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
10:53 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
10:53 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1054.eqiad.wmnet with reason: host reimage
10:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
10:52 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
10:50 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
10:50 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
10:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1002.eqiad.wmnet
10:49 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P52601 and previous config saved to /var/cache/conftool/dbconfig/20230925-104837-ladsgroup.json
10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1034.eqiad.wmnet with OS bullseye
10:48 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
10:48 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
10:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1055.eqiad.wmnet with reason: host reimage
10:47 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
10:47 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
10:47 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
10:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1037.eqiad.wmnet with OS bullseye
10:45 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
10:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
10:45 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1054.eqiad.wmnet with reason: host reimage
10:43 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1055.eqiad.wmnet with reason: host reimage
10:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1039.eqiad.wmnet with OS bullseye
10:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
10:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
10:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
10:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1033.eqiad.wmnet with OS bullseye
10:40 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
10:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1035.eqiad.wmnet with OS bullseye
10:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
10:40 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
10:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1041.eqiad.wmnet with OS bullseye
10:39 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
10:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
10:38 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
10:37 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
10:37 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1040.eqiad.wmnet with OS bullseye
10:36 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1042.eqiad.wmnet with OS bullseye
10:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1128', diff saved to https://phabricator.wikimedia.org/P52600 and previous config saved to /var/cache/conftool/dbconfig/20230925-103454-root.json
10:34 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
10:34 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P52599 and previous config saved to /var/cache/conftool/dbconfig/20230925-103330-ladsgroup.json
10:31 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
10:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
10:27 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1055.eqiad.wmnet with OS bullseye
10:27 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1056.eqiad.wmnet with OS bullseye
10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1054.eqiad.wmnet with OS bullseye
10:27 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1053.eqiad.wmnet with OS bullseye
10:26 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1051.eqiad.wmnet with OS bullseye
10:26 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1050.eqiad.wmnet with OS bullseye
10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1049.eqiad.wmnet with OS bullseye
10:25 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1047.eqiad.wmnet with OS bullseye
10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1048.eqiad.wmnet with OS bullseye
10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1046.eqiad.wmnet with OS bullseye
10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1045.eqiad.wmnet with OS bullseye
10:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
10:24 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1044.eqiad.wmnet with OS bullseye
10:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
10:23 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1043.eqiad.wmnet with OS bullseye
10:23 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
10:22 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
10:20 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
10:20 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
10:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
10:19 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
10:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
10:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
10:18 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T344589)', diff saved to https://phabricator.wikimedia.org/P52597 and previous config saved to /var/cache/conftool/dbconfig/20230925-101824-ladsgroup.json
10:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
10:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
10:17 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
10:09 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
10:09 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
10:08 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
10:08 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
10:05 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1042.eqiad.wmnet with OS bullseye
10:05 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1041.eqiad.wmnet with OS bullseye
10:04 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1040.eqiad.wmnet with OS bullseye
10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1039.eqiad.wmnet with OS bullseye
10:04 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1037.eqiad.wmnet with OS bullseye
10:04 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
10:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1035.eqiad.wmnet with OS bullseye
10:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
10:03 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1033.eqiad.wmnet with OS bullseye
09:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1031.eqiad.wmnet with OS bullseye
09:56 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1032.eqiad.wmnet with OS bullseye
09:53 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1030.eqiad.wmnet with OS bullseye
09:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1022 (T344589)', diff saved to https://phabricator.wikimedia.org/P52596 and previous config saved to /var/cache/conftool/dbconfig/20230925-095235-ladsgroup.json
09:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
09:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on es[1021-1022].eqiad.wmnet with reason: Maintenance
09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on es[1021-1022].eqiad.wmnet with reason: Maintenance
09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1020.eqiad.wmnet with reason: Maintenance
09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
09:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
09:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
09:41 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
09:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
09:38 jelto: switch people.wikimedia.org to codfw - T345618
09:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
09:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
09:34 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
09:33 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db[1137,1216,1220,1225].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Maintenance
09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db[1137,1216,1220,1225].eqiad.wmnet,dbstore1005.eqiad.wmnet with reason: Maintenance
09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 17 hosts with reason: Maintenance
09:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 17 hosts with reason: Maintenance
09:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
09:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1126.eqiad.wmnet with reason: Maintenance
09:20 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1032.eqiad.wmnet with OS bullseye
09:19 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1031.eqiad.wmnet with OS bullseye
09:19 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1030.eqiad.wmnet with OS bullseye
09:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
09:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
09:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 14 hosts with reason: Maintenance
09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
09:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 13 hosts with reason: Maintenance
09:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 13 hosts with reason: Maintenance
09:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
09:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
08:59 Amir1: by the power vested in my be Chris Albon and ML team, I now pronounce ORES dead.
08:58 elukey: migrate ores.wikimedia.org's ATS backend to ores-legacy.discovery.wmnet (k8s app) - This will drain traffic to ORES bare metal nodes - T341696
08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 15 hosts with reason: Maintenance
08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 15 hosts with reason: Maintenance
08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
08:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 16 hosts with reason: Schema change
08:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 16 hosts with reason: Schema change
08:43 jayme: jayme@cumin1001 conftool action : set/pooled=no; selector: name=kubernetes2010.* - T347267
08:43 jayme@cumin1001: conftool action : set/pooled=no; selector: name=kubernetes2010.*
08:39 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: host is down
08:39 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes2010.codfw.wmnet with reason: host is down
08:27 jayme: draining kubernetes2010.codfw.wmnet - T347267
08:01 jayme: cordoning kubernetes2010
07:49 taavi: drop cloudmetrics exceptions from cr firewall ACLs https://gerrit.wikimedia.org/r/c/operations/homer/public/+/960027 T326266
07:47 taavi@deploy2002: Finished scap: Backport for Make sure different key values are handled while submitting (T345496) (duration: 30m 55s)
07:38 taavi@deploy2002: taavi and soda: Continuing with sync
07:37 XioNoX: update eqsin-ulsfo tranport link ospf metrics to match the new latency of 175ms
07:29 taavi@deploy2002: taavi and soda: Backport for Make sure different key values are handled while submitting (T345496) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:22 kevinbazira@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
07:20 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
07:16 taavi@deploy2002: Started scap: Backport for Make sure different key values are handled while submitting (T345496)
07:06 XioNoX: roll out "Block inbound RAs on the routers" - T334916
06:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 35008
06:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 35008
05:27 kart_: Updated cxserver to 2023-09-13-074325-production (T346045)
05:27 kart_: Updated cxserver to 2023-09-13-074325-production (T346045)
05:22 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:22 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:13 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:12 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:08 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:08 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-09-24

23:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T343198)', diff saved to https://phabricator.wikimedia.org/P52595 and previous config saved to /var/cache/conftool/dbconfig/20230924-230515-arnaudb.json
23:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
23:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
23:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T343198)', diff saved to https://phabricator.wikimedia.org/P52594 and previous config saved to /var/cache/conftool/dbconfig/20230924-230443-arnaudb.json
22:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P52593 and previous config saved to /var/cache/conftool/dbconfig/20230924-224936-arnaudb.json
22:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P52592 and previous config saved to /var/cache/conftool/dbconfig/20230924-223430-arnaudb.json
22:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T343198)', diff saved to https://phabricator.wikimedia.org/P52591 and previous config saved to /var/cache/conftool/dbconfig/20230924-221923-arnaudb.json
10:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T343198)', diff saved to https://phabricator.wikimedia.org/P52590 and previous config saved to /var/cache/conftool/dbconfig/20230924-102809-arnaudb.json
10:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
10:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
10:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T343198)', diff saved to https://phabricator.wikimedia.org/P52589 and previous config saved to /var/cache/conftool/dbconfig/20230924-102747-arnaudb.json
10:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P52588 and previous config saved to /var/cache/conftool/dbconfig/20230924-101241-arnaudb.json
09:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P52587 and previous config saved to /var/cache/conftool/dbconfig/20230924-095734-arnaudb.json
09:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T343198)', diff saved to https://phabricator.wikimedia.org/P52586 and previous config saved to /var/cache/conftool/dbconfig/20230924-094227-arnaudb.json

2023-09-23

22:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T343198)', diff saved to https://phabricator.wikimedia.org/P52585 and previous config saved to /var/cache/conftool/dbconfig/20230923-222721-arnaudb.json
22:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
22:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
22:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T343198)', diff saved to https://phabricator.wikimedia.org/P52584 and previous config saved to /var/cache/conftool/dbconfig/20230923-222659-arnaudb.json
22:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P52583 and previous config saved to /var/cache/conftool/dbconfig/20230923-221152-arnaudb.json
21:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P52582 and previous config saved to /var/cache/conftool/dbconfig/20230923-215646-arnaudb.json
21:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T343198)', diff saved to https://phabricator.wikimedia.org/P52581 and previous config saved to /var/cache/conftool/dbconfig/20230923-214139-arnaudb.json
10:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T343198)', diff saved to https://phabricator.wikimedia.org/P52580 and previous config saved to /var/cache/conftool/dbconfig/20230923-101423-arnaudb.json
10:14 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
10:14 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance

2023-09-22

22:56 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
22:56 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
17:32 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@a30e944]: (no justification provided) (duration: 00m 09s)
17:32 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@a30e944]: (no justification provided)
15:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2004.mgmt.codfw.wmnet with reboot policy FORCED
15:45 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:45 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adding ganeti-test server to codfw - jhancock@cumin2002"
15:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:31 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:30 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:24 denisse: upgrading LibreNMS in eqiad
15:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1247']
15:19 denisse: upgrading LibreNMS to 23.9.1
15:13 denisse@deploy2002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.9.1 - T346737 (duration: 00m 09s)
15:13 denisse@deploy2002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.9.1 - T346737
15:12 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1247']
15:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc1015']
14:58 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
14:57 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1015']
14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
13:45 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
13:39 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
12:23 brouberol@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
12:17 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
12:13 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
11:58 brouberol@cumin1001: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
11:42 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apt-staging2001.codfw.wmnet with OS bookworm
11:41 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
11:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
11:28 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apt-staging2001.codfw.wmnet with reason: host reimage
11:25 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on apt-staging2001.codfw.wmnet with reason: host reimage
11:09 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host apt-staging2001.codfw.wmnet with OS bookworm
10:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testreduce1001.eqiad.wmnet
10:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1001.eqiad.wmnet
10:00 fabfur: repool cp1090 (T346874)
09:53 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
09:50 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
09:45 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcumin1001.eqiad.wmnet
09:45 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
09:43 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcumin1001.eqiad.wmnet
09:43 fnegri@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
09:23 Amir1: dbmaint on s2@eqiad (T343198)
09:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 16 hosts with reason: Schema change
09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 16 hosts with reason: Schema change
09:13 moritzm: installing perf updates on bookworm hosts
09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 15 hosts with reason: Schema change
09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 15 hosts with reason: Schema change
09:06 moritzm: installing perf updates on buster hosts
08:51 Amir1: dbmaint on s4@eqiad (T343198)
08:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 20 hosts with reason: Schema change
08:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 20 hosts with reason: Schema change
07:45 hashar: Upgrading CI Jenkins from 2.401.3 to 2.414.2
07:36 hashar: Restarting Gerrit to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/953967 "Link account creation to IDM" # T345226
07:06 moritzm: installing mutt security updates
06:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1132', diff saved to https://phabricator.wikimedia.org/P52577 and previous config saved to /var/cache/conftool/dbconfig/20230922-063617-root.json
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P52576 and previous config saved to /var/cache/conftool/dbconfig/20230922-063212-root.json
05:13 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
00:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
00:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
00:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T343198)', diff saved to https://phabricator.wikimedia.org/P52575 and previous config saved to /var/cache/conftool/dbconfig/20230922-004330-arnaudb.json
00:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P52574 and previous config saved to /var/cache/conftool/dbconfig/20230922-002823-arnaudb.json
00:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P52573 and previous config saved to /var/cache/conftool/dbconfig/20230922-001316-arnaudb.json

2023-09-21

23:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T343198)', diff saved to https://phabricator.wikimedia.org/P52572 and previous config saved to /var/cache/conftool/dbconfig/20230921-235810-arnaudb.json
22:02 ejegg: Standalone (listener) SmashPig upgraded from ca5b6218 to 2412df22
20:28 brennen: end of UTC late backport & config window
20:27 brennen@deploy2002: Finished scap: Backport for Update Reader Demographics 2 pilot survey (T345951) (duration: 21m 36s)
20:18 brennen@deploy2002: dani and brennen: Continuing with sync
20:17 brennen@deploy2002: dani and brennen: Backport for Update Reader Demographics 2 pilot survey (T345951) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:06 brennen@deploy2002: Started scap: Backport for Update Reader Demographics 2 pilot survey (T345951)
20:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T343198)', diff saved to https://phabricator.wikimedia.org/P52570 and previous config saved to /var/cache/conftool/dbconfig/20230921-200439-arnaudb.json
20:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
20:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
20:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T343198)', diff saved to https://phabricator.wikimedia.org/P52569 and previous config saved to /var/cache/conftool/dbconfig/20230921-200417-arnaudb.json
20:01 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reords for codfw test servers - cmooney@cumin1001"
19:59 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reords for codfw test servers - cmooney@cumin1001"
19:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P52568 and previous config saved to /var/cache/conftool/dbconfig/20230921-194911-arnaudb.json
19:47 cmooney@cumin1001: START - Cookbook sre.dns.netbox
19:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P52567 and previous config saved to /var/cache/conftool/dbconfig/20230921-193404-arnaudb.json
19:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T343198)', diff saved to https://phabricator.wikimedia.org/P52566 and previous config saved to /var/cache/conftool/dbconfig/20230921-191858-arnaudb.json
19:17 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
19:13 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add codfw new switches - cmooney@cumin1001"
18:54 ladsgroup@deploy2002: Finished scap: Backport for Enable Url shortener in sidebar in all wikis (T267921) (duration: 20m 47s)
18:47 ejegg: payments-wiki upgraded from 9cd3e4cd to 5596c7fd
18:45 ladsgroup@deploy2002: ladsgroup: Continuing with sync
18:45 ladsgroup@deploy2002: ladsgroup: Backport for Enable Url shortener in sidebar in all wikis (T267921) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52565 and previous config saved to /var/cache/conftool/dbconfig/20230921-184000-ladsgroup.json
18:34 ladsgroup@deploy2002: Started scap: Backport for Enable Url shortener in sidebar in all wikis (T267921)
18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52564 and previous config saved to /var/cache/conftool/dbconfig/20230921-182455-ladsgroup.json
18:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.27 refs T345888
18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52562 and previous config saved to /var/cache/conftool/dbconfig/20230921-180949-ladsgroup.json
18:05 brennen: train 1.41.0-wmf.27 (T345888): no current blockers, logs clean, rolling to group2 shortly.
18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1166 (T346365)', diff saved to https://phabricator.wikimedia.org/P52561 and previous config saved to /var/cache/conftool/dbconfig/20230921-180003-ladsgroup.json
17:59 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@ddcc518]: Deploy latest DAGs to analytics Airflow instance (duration: 00m 40s)
17:58 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@ddcc518]: Deploy latest DAGs to analytics Airflow instance
17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1166 (T346365)', diff saved to https://phabricator.wikimedia.org/P52560 and previous config saved to /var/cache/conftool/dbconfig/20230921-175634-ladsgroup.json
17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2149 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P52559 and previous config saved to /var/cache/conftool/dbconfig/20230921-175444-ladsgroup.json
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2149 (T346365)', diff saved to https://phabricator.wikimedia.org/P52558 and previous config saved to /var/cache/conftool/dbconfig/20230921-174934-ladsgroup.json
17:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2014.codfw.wmnet
17:41 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2014.codfw.wmnet
17:35 ejegg: re-enabled contribution tracking queue consumer
17:30 ejegg: civicrm upgraded from f0e9d3f6 to 9efea665
17:29 ejegg: disabled contribution_tracking queue consumer for Civi update
17:27 eoghan@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host apt-staging2001.codfw.wmnet
17:27 eoghan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apt-staging2001.codfw.wmnet with OS bookworm
17:21 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2014.codfw.wmnet with OS bullseye
16:45 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
16:45 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2014.codfw.wmnet with reason: host reimage
16:42 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2014.codfw.wmnet with reason: host reimage
16:26 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2014.codfw.wmnet with OS bullseye
16:11 eoghan@cumin1001: START - Cookbook sre.hosts.reimage for host apt-staging2001.codfw.wmnet with OS bookworm
16:10 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
16:10 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
16:10 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apt-staging2001.codfw.wmnet on all recursors
16:09 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache apt-staging2001.codfw.wmnet on all recursors
16:09 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:09 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
16:08 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM apt-staging2001.codfw.wmnet - eoghan@cumin1001"
16:02 eoghan@cumin1001: START - Cookbook sre.dns.netbox
16:02 eoghan@cumin1001: START - Cookbook sre.ganeti.makevm for new host apt-staging2001.codfw.wmnet
15:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T343198)', diff saved to https://phabricator.wikimedia.org/P52557 and previous config saved to /var/cache/conftool/dbconfig/20230921-153428-arnaudb.json
15:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
15:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
15:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T343198)', diff saved to https://phabricator.wikimedia.org/P52556 and previous config saved to /var/cache/conftool/dbconfig/20230921-153406-arnaudb.json
15:33 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
15:25 jayme@deploy2002: Finished scap: (no justification provided) (duration: 02m 29s)
15:22 jayme@deploy2002: Started scap: (no justification provided)
15:20 moritzm: installing php7.3 security updates (as packaged in Debian Buster)
15:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P52555 and previous config saved to /var/cache/conftool/dbconfig/20230921-151900-arnaudb.json
15:14 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for SpecialUndelete: Do not clone RequestContext (T346995) (duration: 34m 13s)
15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 22 hosts with reason: Schema change
15:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 22 hosts with reason: Schema change
15:12 Amir1: dbmaint on s8@eqiad (T343198)
15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 18 hosts with reason: Schema change
15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on 18 hosts with reason: Schema change
15:06 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2013.codfw.wmnet
15:06 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for restbase2013.codfw.wmnet
15:05 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
15:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P52554 and previous config saved to /var/cache/conftool/dbconfig/20230921-150353-arnaudb.json
15:01 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for SpecialUndelete: Do not clone RequestContext (T346995) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T343198)', diff saved to https://phabricator.wikimedia.org/P52553 and previous config saved to /var/cache/conftool/dbconfig/20230921-144847-arnaudb.json
14:41 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2013.codfw.wmnet with OS bullseye
14:40 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for SpecialUndelete: Do not clone RequestContext (T346995)
14:31 moritzm: imported cas 6.6.12+wmf11u1 to apt.wikimedia.org
14:31 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
14:19 tchanders@deploy2002: Finished scap: Backport for Enable partial action blocks on mediawikiwiki (T332733) (duration: 34m 01s)
14:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2013.codfw.wmnet with reason: host reimage
14:14 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2013.codfw.wmnet with reason: host reimage
14:07 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
14:04 tchanders@deploy2002: tchanders: Continuing with sync
14:03 tchanders@deploy2002: tchanders: Backport for Enable partial action blocks on mediawikiwiki (T332733) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:59 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase2013.codfw.wmnet with OS bullseye
13:53 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
13:43 tchanders@deploy2002: Started scap: Backport for Enable partial action blocks on mediawikiwiki (T332733)
13:39 tchanders@deploy2002: Finished scap: Backport for Enable partial action blocks on commonswiki (T339878) (duration: 35m 04s)
13:37 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
13:36 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
13:34 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1015
13:34 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1015
13:30 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
13:28 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
13:27 vriley@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
13:26 tchanders@deploy2002: tchanders: Continuing with sync
13:25 tchanders@deploy2002: tchanders: Backport for Enable partial action blocks on commonswiki (T339878) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:25 urbanecm: mwmaint2002: `mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki 'Private Incident Reporting System/Updates' 'Incident Reporting System/Updates' 'Martin Urbanec' --reason 'per request'` (T347019)
13:08 fabfur: disabled puppet on cp1090 for T346874
13:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2028.codfw.wmnet with OS bullseye
13:04 tchanders@deploy2002: Started scap: Backport for Enable partial action blocks on commonswiki (T339878)
12:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
12:42 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
12:33 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
12:31 milimetric@deploy2002: Finished deploy [analytics/aqs/deploy@041016f] (aqs): Enable etags on all AQS 1.0 endpoints (duration: 10m 23s)
12:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes2028.codfw.wmnet with OS bullseye
12:22 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
12:21 milimetric@deploy2002: Started deploy [analytics/aqs/deploy@041016f] (aqs): Enable etags on all AQS 1.0 endpoints
12:20 fabfur: depooled cp1090.eqiad.wmnet to test new purged package version (T346874)
12:10 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
12:03 effie: cordon kubernetes2028 to reimage
11:59 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
11:57 ladsgroup@deploy2002: Finished scap: Backport for Turn on write both for pagelinks in largest s3 wikis (T345732) (duration: 36m 44s)
11:45 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
11:43 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:43 ladsgroup@deploy2002: ladsgroup: Backport for Turn on write both for pagelinks in largest s3 wikis (T345732) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
11:39 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
11:33 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
11:28 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
11:21 ladsgroup@deploy2002: Started scap: Backport for Turn on write both for pagelinks in largest s3 wikis (T345732)
11:20 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided) (duration: 01m 05s)
11:19 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@c579111] (releasing): (no justification provided)
11:08 arturo: merging homer CR firewall patch https://gerrit.wikimedia.org/r/c/operations/homer/public/+/959706 for T346948
10:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T343198)', diff saved to https://phabricator.wikimedia.org/P52550 and previous config saved to /var/cache/conftool/dbconfig/20230921-105723-arnaudb.json
10:57 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
10:57 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
10:54 moritzm: installing c-ares security updates
10:49 ladsgroup@deploy2002: Finished scap: Backport for Enable pagelinks write both in testwiki (T345732) (duration: 36m 27s)
10:48 moritzm: installing flac security updates
10:42 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:36 ladsgroup@deploy2002: ladsgroup: Continuing with sync
10:34 ladsgroup@deploy2002: ladsgroup: Backport for Enable pagelinks write both in testwiki (T345732) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
10:27 XioNoX: set max repeaters = 20 on asw2-a-eqiad - T346759
10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
10:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
10:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
10:19 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:18 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
10:17 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
10:17 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
10:17 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
10:17 cmooney@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
10:17 cmooney@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
10:12 ladsgroup@deploy2002: Started scap: Backport for Enable pagelinks write both in testwiki (T345732)
10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove eqsin-eqdfw tunnel - ayounsi@cumin1001"
10:09 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove eqsin-eqdfw tunnel - ayounsi@cumin1001"
10:07 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
09:55 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:51 effie: disable puppet on kubernetes[2025-2053].codfw.wmnet
09:42 kevinbazira@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:41 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:40 cmooney@cumin1001: START - Cookbook sre.dns.netbox
09:40 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
09:38 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
09:38 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
09:36 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
09:36 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
09:35 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
09:34 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
09:33 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
09:32 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
09:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
09:32 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
09:30 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
09:30 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
09:28 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
09:28 XioNoX: remove GRE tunnel between eqsin and eqdfw - T344888
09:27 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
09:08 kevinbazira@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti2030.codfw.wmnet with reason: Fixup DRBD
09:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti2030.codfw.wmnet with reason: Fixup DRBD
09:00 taavi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1007.wikimedia.org
09:00 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:00 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1007.wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
08:59 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1007.wikimedia.org decommissioned, removing all IPs except the asset tag one - taavi@cumin1001"
08:57 taavi@cumin1001: START - Cookbook sre.dns.netbox
08:51 taavi@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1007.wikimedia.org
08:14 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:14 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
08:14 brouberol: redeploying mw-page-content-change-enrich in staging T336041
08:13 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
08:13 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
08:13 brouberol: redeploying eventstreams-internal in staging T336041
08:12 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
08:12 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
08:12 brouberol: redeploying eventgate-analytics-external in staging T336041
08:10 brouberol@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
08:10 brouberol@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
07:52 kartik@deploy2002: Finished scap: Backport for Enable MinT translation service on Meta-Wiki - rollout #5 (T341445) (duration: 42m 01s)
07:38 kartik@deploy2002: kartik and abi: Continuing with sync
07:32 kartik@deploy2002: kartik and abi: Backport for Enable MinT translation service on Meta-Wiki - rollout #5 (T341445) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:10 kartik@deploy2002: Started scap: Backport for Enable MinT translation service on Meta-Wiki - rollout #5 (T341445)
06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2915
06:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2915
06:31 taavi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:31 taavi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix cloudsw cloud-private records - taavi@cumin1001"
06:30 taavi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix cloudsw cloud-private records - taavi@cumin1001"
06:28 taavi@cumin1001: START - Cookbook sre.dns.netbox
05:52 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
05:49 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
05:47 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
05:47 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
05:44 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
05:44 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
05:40 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
05:38 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
05:24 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
05:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
02:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1248']
02:17 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1249']
02:17 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1246']
02:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1245']
02:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1242']
02:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1249']
02:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1248']
02:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1246']
02:08 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1244']
02:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1243']
02:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1241']
02:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1240']
02:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1245']
01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1239']
01:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1244']
01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1237']
01:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1243']
01:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1242']
01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1238']
01:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1236']
01:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1241']
01:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1235']
01:58 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1240']
01:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1234']
01:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1239']
01:50 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1245.mgmt.eqiad.wmnet with reboot policy FORCED
01:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1235']
01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1238']
01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1237']
01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1236']
01:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1234']
01:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1248.mgmt.eqiad.wmnet with reboot policy FORCED
01:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1246.mgmt.eqiad.wmnet with reboot policy FORCED
01:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
01:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1249.mgmt.eqiad.wmnet with reboot policy FORCED
01:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1244.mgmt.eqiad.wmnet with reboot policy FORCED
01:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1246.mgmt.eqiad.wmnet with reboot policy FORCED
01:30 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1245.mgmt.eqiad.wmnet with reboot policy FORCED
01:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1248.mgmt.eqiad.wmnet with reboot policy FORCED
01:29 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1247.mgmt.eqiad.wmnet with reboot policy FORCED
01:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1243.mgmt.eqiad.wmnet with reboot policy FORCED
01:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1241.mgmt.eqiad.wmnet with reboot policy FORCED
01:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1242.mgmt.eqiad.wmnet with reboot policy FORCED
01:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1240.mgmt.eqiad.wmnet with reboot policy FORCED
01:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1249.mgmt.eqiad.wmnet with reboot policy FORCED
01:17 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1235.mgmt.eqiad.wmnet with reboot policy FORCED
01:11 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1244.mgmt.eqiad.wmnet with reboot policy FORCED
01:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1234.mgmt.eqiad.wmnet with reboot policy FORCED
01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1243.mgmt.eqiad.wmnet with reboot policy FORCED
01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1242.mgmt.eqiad.wmnet with reboot policy FORCED
01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1241.mgmt.eqiad.wmnet with reboot policy FORCED
01:10 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1240.mgmt.eqiad.wmnet with reboot policy FORCED
01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1237.mgmt.eqiad.wmnet with reboot policy FORCED
01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1239.mgmt.eqiad.wmnet with reboot policy FORCED
01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1238.mgmt.eqiad.wmnet with reboot policy FORCED
01:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1236.mgmt.eqiad.wmnet with reboot policy FORCED
00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1239.mgmt.eqiad.wmnet with reboot policy FORCED
00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1238.mgmt.eqiad.wmnet with reboot policy FORCED
00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1237.mgmt.eqiad.wmnet with reboot policy FORCED
00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1235.mgmt.eqiad.wmnet with reboot policy FORCED
00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1234.mgmt.eqiad.wmnet with reboot policy FORCED
00:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1236.mgmt.eqiad.wmnet with reboot policy FORCED
00:49 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1249
00:48 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1248
00:47 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1249
00:47 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1246
00:47 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1247
00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1248
00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1245
00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1247
00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1244
00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1246
00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1243
00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1245
00:46 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1244
00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1242
00:46 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1240
00:45 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1243
00:45 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1241
00:44 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1242
00:44 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1241
00:44 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1240
00:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1238
00:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1236
00:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1239
00:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1239
00:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1238
00:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1236
00:42 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1234
00:42 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1235
00:41 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1235
00:41 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1234
00:39 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:39 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[34-49] - jclark@cumin1001"
00:38 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[34-49] - jclark@cumin1001"
00:36 jclark@cumin1001: START - Cookbook sre.dns.netbox
00:15 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc1016']
00:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
00:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
00:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pc1015']
00:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1015']
00:07 jclark@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['pc1016']
00:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1016']
00:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc1016']
00:05 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED

2023-09-20

23:55 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED
23:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED
23:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
23:51 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
23:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1016.mgmt.eqiad.wmnet with reboot policy FORCED
23:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
23:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1016
23:50 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
23:49 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1015
23:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1015
23:49 jclark@cumin1001: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host pc1016
23:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1016
23:48 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1016
23:48 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:48 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pc1016 - jclark@cumin1001"
23:47 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pc1016 - jclark@cumin1001"
23:44 jclark@cumin1001: START - Cookbook sre.dns.netbox
19:26 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@80496b8]: (no justification provided) (duration: 00m 05s)
19:26 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@80496b8]: (no justification provided)
19:25 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@80496b8]: (no justification provided) (duration: 00m 09s)
19:24 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@80496b8]: (no justification provided)
18:21 brennen@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.27 refs T345888 (duration: 07m 17s)
18:14 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.27 refs T345888
18:02 brennen: train 1.41.0-wmf.27 (T345888): no current blockers, logs clean, rolling to group1
16:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
16:28 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
16:26 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
16:26 klausman: pushing revert of ORES TTL change
16:24 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
15:30 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
15:30 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
15:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
15:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
15:29 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
15:29 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
15:09 moritzm: added Taavi and Effie (new key) to pwstore
15:08 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
15:08 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
15:06 brouberol@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
15:05 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
15:05 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
15:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:03 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
15:03 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
15:02 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
15:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
14:59 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
14:58 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
14:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud-private records - cmooney@cumin1001"
14:44 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new cloud-private records - cmooney@cumin1001"
14:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:35 kamila_: update maintenance.eqiad.wmnet to point to mwmaint2002
14:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic2044.codfw.wmnet for high load - bking@cumin1001
14:26 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2044.codfw.wmnet for high load - bking@cumin1001
14:25 bking@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic2044 for high load - bking@cumin1001
14:25 bking@cumin1001: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic2044 for high load - bking@cumin1001
14:16 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
14:10 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
14:09 kamila@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover: MediaWiki - T346474 (duration: 12m 54s)
14:07 kamila_: Phase 9.5 Update DNS records for new database masters - T346474
14:06 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
14:06 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
14:06 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
14:04 marostegui: Testing
14:03 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
14:03 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
14:03 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
14:03 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
14:03 kamila@cumin1001: MediaWiki read-only period ends at: 2023-09-20 14:02:59.798838
14:03 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
14:02 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
14:02 kamila@cumin1001: MediaWiki read-only period ends at: 2023-09-20 14:02:53.790615
14:00 kamila@cumin1001: MediaWiki read-only period starts at: 2023-09-20 14:00:32.114116
14:00 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
13:57 kamila@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
13:57 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
13:57 kamila@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
13:57 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
13:56 kamila@deploy2002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover: MediaWiki - T346474
13:56 urbanecm@deploy2002: Finished scap: Backport for build: Update eslint-config-wikimedia to 0.25.1 (T346629), Change CSS selector for Minerva mobile menu icon (T346459) (duration: 34m 21s)
13:56 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
13:50 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
13:50 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
13:50 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
13:49 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
13:49 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
13:49 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm-test1001.wikimedia.org with OS bookworm
13:43 urbanecm@deploy2002: urbanecm and jdlrobson: Continuing with sync
13:42 urbanecm@deploy2002: urbanecm and jdlrobson: Backport for build: Update eslint-config-wikimedia to 0.25.1 (T346629), Change CSS selector for Minerva mobile menu icon (T346459) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:31 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
13:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
13:21 urbanecm@deploy2002: Started scap: Backport for build: Update eslint-config-wikimedia to 0.25.1 (T346629), Change CSS selector for Minerva mobile menu icon (T346459)
13:12 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm-test1001.wikimedia.org with OS bookworm
13:02 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 00m 27s)
13:02 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
12:54 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 02m 10s)
12:52 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
12:52 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 04m 43s)
12:47 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
12:45 akosiaris@deploy2002: Finished deploy [restbase/deploy@e8a6ae4]: (no justification provided) (duration: 04m 34s)
12:41 akosiaris: T346354 deploy RESTBase after bug is fixed
12:40 akosiaris@deploy2002: Started deploy [restbase/deploy@e8a6ae4]: (no justification provided)
11:56 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:56 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
11:49 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:49 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
11:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org
11:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb1004.wikimedia.org
11:20 gmodena@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:20 gmodena@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:17 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) openstack.eqiad1.wikimediacloud.org on all recursors
11:17 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache openstack.eqiad1.wikimediacloud.org on all recursors
11:14 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:14 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack.eqiad1 - aborrero@cumin1001"
11:13 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack.eqiad1 - aborrero@cumin1001"
11:11 aborrero@cumin1001: START - Cookbook sre.dns.netbox
10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
10:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
10:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner2004.codfw.wmnet with OS bullseye
10:04 brouberol@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
10:03 klausman: RUnning authdns-update to activate change 957689 (T341696)
10:02 klausman: Merging change 957689 (T341696) to lower DNS TTL to 5m for ORES name.
10:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
10:00 Emperor: ms-be10[61-75] swift package updates T346730
09:56 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner2004.codfw.wmnet with reason: host reimage
09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1005.eqiad.wmnet with OS bullseye
09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
09:54 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
09:48 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on kafka-jumbo1003.eqiad.wmnet with reason: investigation by brouberol and elukey about kafka ACL issues that might be fixed by a broker restart
09:48 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 0:10:00 on kafka-jumbo1003.eqiad.wmnet with reason: investigation by brouberol and elukey about kafka ACL issues that might be fixed by a broker restart
09:41 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab-runner2004.codfw.wmnet with OS bullseye
09:39 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
09:38 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
09:35 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
09:34 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudservices1005 - aborrero@cumin1001 - T346042"
09:34 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:34 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
09:33 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:32 aborrero@cumin1001: START - Cookbook sre.dns.netbox
09:29 klausman: Draining ml-serve1008 for kubelet partition increase (T339231)
09:24 klausman: Draining ml-serve1007 for kubelet partition increase (T339231)
09:15 klausman: Draining ml-serve1006 for kubelet partition increase (T339231)
09:13 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
09:09 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
09:08 fabfur: applied patch https://gerrit.wikimedia.org/r/c/operations/puppet/+/957292 (T344175) to add new mobile redirect domains to Varnish. Changes will be applied automatically by puppet on all cp hosts
09:06 klausman: Draining ml-serve1005 for kubelet partition increase (T339231)
09:00 godog: restore benthos@webrequest_live running on both centrallog hosts - T346871
08:57 klausman: Draining ml-serve1004 for kubelet partition increase (T339231)
08:47 klausman: Draining ml-serve1003 for kubelet partition increase (T339231)
08:47 godog: temp bump threads to 15 for benthos@webrequest_live on centrallog2002 - T346871
08:40 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.eqiad.wmnet with OS bullseye
08:40 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices1005.eqiad.wmnet with OS bullseye
08:40 klausman: Draining ml-serve1002 for kubelet partition increase (T339231)
08:36 godog: stop benthos@webrequest_live.service on centrallog1002 to test redudancy/capacity - T346871
08:33 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices1005.eqiad.wmnet with OS bullseye
08:32 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:31 aborrero@cumin1001: START - Cookbook sre.dns.netbox
08:31 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices1005
08:31 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1005
08:30 aborrero@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudservices1005
08:30 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices1005
08:22 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
08:20 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
08:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
08:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
08:10 moritzm: restarting FPM on mw* to pick up libwebp security updates
08:02 moritzm: installing libwebp security updates on buster
07:42 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm1001.wikimedia.org with OS bookworm
07:41 taavi@deploy2002: Finished scap: Backport for Set READ_NEW for Wikitech on OATHAuth multiple devices migration (T242031), Set WRITE_NEW for OATHAuth multiple devices on fishbowls/privates (T242031) (duration: 36m 09s)
07:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
07:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
07:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
07:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-web1001.eqiad.wmnet
07:28 taavi@deploy2002: taavi: Continuing with sync
07:26 taavi@deploy2002: taavi: Backport for Set READ_NEW for Wikitech on OATHAuth multiple devices migration (T242031), Set WRITE_NEW for OATHAuth multiple devices on fishbowls/privates (T242031) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental X
07:24 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm1001.wikimedia.org with reason: host reimage
07:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-web1001.eqiad.wmnet
07:22 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm1001.wikimedia.org with reason: host reimage
07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1005.eqiad.wmnet
07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1005.eqiad.wmnet
07:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1007.eqiad.wmnet
07:09 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm1001.wikimedia.org with OS bookworm
07:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1007.eqiad.wmnet
07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1008.eqiad.wmnet
07:05 taavi@deploy2002: Started scap: Backport for Set READ_NEW for Wikitech on OATHAuth multiple devices migration (T242031), Set WRITE_NEW for OATHAuth multiple devices on fishbowls/privates (T242031)
07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1008.eqiad.wmnet
06:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1009.eqiad.wmnet
06:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1009.eqiad.wmnet
06:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1010.eqiad.wmnet
06:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1010.eqiad.wmnet
06:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-tool1011.eqiad.wmnet
06:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-tool1011.eqiad.wmnet
01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1002.eqiad.wmnet with OS bullseye
01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
00:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
00:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pki1002.eqiad.wmnet with OS bullseye
00:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1232']
00:10 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1233']
00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1231']
00:02 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1229']
00:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1233']
00:01 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1230']
00:01 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1228']
00:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1232']
00:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1231']
00:00 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1227']
00:00 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db1226']

2023-09-19

23:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1230']
23:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1229']
23:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1228']
23:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1227']
23:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db1226']
23:51 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
23:31 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
23:30 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:29 jclark@cumin1001: START - Cookbook sre.dns.netbox
23:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1232.mgmt.eqiad.wmnet with reboot policy FORCED
23:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1233.mgmt.eqiad.wmnet with reboot policy FORCED
23:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1231.mgmt.eqiad.wmnet with reboot policy FORCED
22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1233.mgmt.eqiad.wmnet with reboot policy FORCED
22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1232.mgmt.eqiad.wmnet with reboot policy FORCED
22:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1231.mgmt.eqiad.wmnet with reboot policy FORCED
22:57 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1228.mgmt.eqiad.wmnet with reboot policy FORCED
22:57 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1230.mgmt.eqiad.wmnet with reboot policy FORCED
22:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1227.mgmt.eqiad.wmnet with reboot policy FORCED
22:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1226.mgmt.eqiad.wmnet with reboot policy FORCED
22:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
22:51 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@b603e64]: (no justification provided) (duration: 00m 05s)
22:51 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@b603e64]: (no justification provided)
22:46 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
22:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
21:50 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.27 refs T345888
21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1227.mgmt.eqiad.wmnet with reboot policy FORCED
21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1228.mgmt.eqiad.wmnet with reboot policy FORCED
21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1230.mgmt.eqiad.wmnet with reboot policy FORCED
21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1229.mgmt.eqiad.wmnet with reboot policy FORCED
21:49 jclark@cumin1001: START - Cookbook sre.hosts.provision for host db1226.mgmt.eqiad.wmnet with reboot policy FORCED
21:48 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1232
21:47 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1232
21:46 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:45 jclark@cumin1001: START - Cookbook sre.dns.netbox
21:41 brennen: train 1.41.0-wmf.27 (T345888): blockers resolved; rolling to group0
21:37 brennen@deploy2002: Finished scap: Backport for Disable client preferences by default (T345363) (duration: 40m 45s)
21:37 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1232
21:37 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1232
21:36 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1233
21:36 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1231
21:35 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1233
21:34 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host db1232
21:34 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1232
21:34 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1231
21:33 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:32 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1230
21:32 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1229
21:32 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:32 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
21:31 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1230
21:31 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1226
21:31 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1229
21:31 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db1227
21:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1227
21:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host db1226
21:29 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:29 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[26-33] - jclark@cumin1001"
21:29 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt db12[26-33] - jclark@cumin1001"
21:26 jclark@cumin1001: START - Cookbook sre.dns.netbox
21:25 brennen@deploy2002: jdlrobson and brennen: Continuing with sync
21:20 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1010']
21:17 brennen@deploy2002: jdlrobson and brennen: Backport for Disable client preferences by default (T345363) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
21:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008']
21:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010']
21:11 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1009']
21:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008']
21:07 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1007']
21:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009']
21:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
20:59 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1007']
20:57 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pki1002']
20:57 brennen@deploy2002: Started scap: Backport for Disable client preferences by default (T345363)
20:55 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002']
20:55 brennen@deploy2002: Finished scap: Backport for Fixes cannot read properties of undefined (T342277) (duration: 37m 39s)
20:51 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@b603e64]: (no justification provided) (duration: 00m 05s)
20:51 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@b603e64]: (no justification provided)
20:50 bearloga@deploy2002: Finished deploy [airflow-dags/analytics_product@b603e64]: (no justification provided) (duration: 00m 09s)
20:50 bearloga@deploy2002: Started deploy [airflow-dags/analytics_product@b603e64]: (no justification provided)
20:42 brennen@deploy2002: jdlrobson and brennen: Continuing with sync
20:38 brennen@deploy2002: jdlrobson and brennen: Backport for Fixes cannot read properties of undefined (T342277) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:37 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
20:36 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010
20:36 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010
20:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:35 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007-10 - jclark@cumin1001"
20:34 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudelastic1007-10 - jclark@cumin1001"
20:32 jclark@cumin1001: START - Cookbook sre.dns.netbox
20:18 brennen@deploy2002: Started scap: Backport for Fixes cannot read properties of undefined (T342277)
19:48 brennen@deploy2002: Finished scap: Backport for Revert "ResourceLoader: Set 'virtualFilePath' for startup.js" (T346800) (duration: 40m 46s)
19:31 brennen@deploy2002: jforrester and brennen: Continuing with sync
19:29 brennen@deploy2002: jforrester and brennen: Backport for Revert "ResourceLoader: Set 'virtualFilePath' for startup.js" (T346800) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
19:24 vriley@cumin1001: START - Cookbook sre.hosts.provision for host pc1015.mgmt.eqiad.wmnet with reboot policy FORCED
19:21 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc1015
19:20 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pc1015
19:07 brennen@deploy2002: Started scap: Backport for Revert "ResourceLoader: Set 'virtualFilePath' for startup.js" (T346800)
16:28 claime: Deployed https://gerrit.wikimedia.org/r/953344 - T345204
16:04 kamila_: DC Switchover: traffic - T346330
15:59 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
15:59 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
15:58 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
15:58 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
15:57 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
15:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
15:57 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
15:57 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
15:57 cgoubert@deploy2002: Finished scap: (no justification provided) (duration: 03m 12s)
15:56 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/similar-users: apply
15:56 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/similar-users: apply
15:56 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/similar-users: apply
15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/similar-users: apply
15:55 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
15:55 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
15:54 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
15:54 akosiaris: scaling down mobileapps, wikifeeds, mathoid, similar-users
15:54 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
15:53 cgoubert@deploy2002: Started scap: (no justification provided)
15:53 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
15:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
15:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
15:52 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
15:52 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
15:52 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
15:52 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
15:51 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
15:51 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
15:51 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:46 cgoubert@deploy2002: Finished scap: (no justification provided) (duration: 40m 44s)
15:45 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:28 vriley@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
15:26 claime: running puppet on 'A:cp-text and P{P:trafficserver::backend}' - T346330
15:25 claime: reduce mw-on-k8s traffic to 3% waiting on new nodes - T346330
15:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
15:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudelastic1009.mgmt.eqiad.wmnet with reboot policy FORCED
15:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
15:06 cgoubert@deploy2002: Started scap: (no justification provided)
15:05 kamila@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330 (duration: 34m 46s)
15:02 akosiaris: increase thumbor's pods in codfw to 48 to harmonize with eqiad
15:02 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:02 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
14:56 vriley@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1007.mgmt.eqiad.wmnet with reboot policy FORCED
14:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1010.mgmt.eqiad.wmnet with reboot policy FORCED
14:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1009.mgmt.eqiad.wmnet with reboot policy FORCED
14:56 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudelastic1008.mgmt.eqiad.wmnet with reboot policy FORCED
14:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1010
14:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1010
14:51 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1009
14:51 vriley@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1008
14:50 moritzm: installing python-werkzeug security updates
14:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1009
14:49 vriley@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1008
14:49 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudelastic1007
14:48 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudelastic1007
14:46 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1026-56} - jclark@cumin1001"
14:45 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1026-56} - jclark@cumin1001"
14:42 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:36 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=swift-rw,name=codfw
14:36 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-rw,name=eqiad
14:33 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift-ro
14:33 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=swift
14:32 kamila_: Switch deployment server - T346330
14:30 kamila@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330
14:28 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in eqiad: Datacenter Switchover: Services - T346330
14:28 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=thumbor
14:25 oblivian@deploy1002: Finished scap: (no justification provided) (duration: 05m 44s)
14:20 oblivian@deploy1002: Started scap: (no justification provided)
14:20 kamila@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330 (duration: 19m 27s)
14:01 kamila@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in eqiad: Datacenter Switchover: Services - T346330
14:00 kamila@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover: Services & Traffic - T346330
13:58 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript namespaceDupes.php --wiki shwiki --fix` T346588
13:57 samtar@deploy1002: Finished scap: Backport for Add namespace aliases to shwiki (T346588) (duration: 51m 50s)
13:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:53 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-test-client1001.eqiad.wmnet
13:52 elukey: clean old puppet certs kafka_logging-{eqiad,codfw}_broker from the Puppet CA and from Puppet private - T300130
13:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
13:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating DNS record of kuberbetes2026 - jhancock@cumin2002"
13:51 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
13:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating DNS record of kuberbetes2026 - jhancock@cumin2002"
13:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
13:47 jebe@deploy1002: Finished deploy [airflow-dags/analytics@6b9855a]: (no justification provided) (duration: 00m 43s)
13:46 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-test-client1001.eqiad.wmnet
13:46 jebe@deploy1002: Started deploy [airflow-dags/analytics@6b9855a]: (no justification provided)
13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flerovium.eqiad.wmnet
13:33 samtar@deploy1002: samtar and aleksandar: Continuing with sync
13:28 samtar@deploy1002: samtar and aleksandar: Backport for Add namespace aliases to shwiki (T346588) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host flerovium.eqiad.wmnet
13:17 jebe@deploy1002: Finished deploy [analytics/refinery@2d9d6d0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2d9d6d0] (duration: 02m 06s)
13:15 Emperor: ms-be10[44-60] swift package updates T346730
13:15 jebe@deploy1002: Started deploy [analytics/refinery@2d9d6d0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2d9d6d0]
13:14 jebe@deploy1002: Finished deploy [analytics/refinery@2d9d6d0] (thin): Regular analytics weekly train THIN [analytics/refinery@2d9d6d0] (duration: 00m 04s)
13:14 jebe@deploy1002: Started deploy [analytics/refinery@2d9d6d0] (thin): Regular analytics weekly train THIN [analytics/refinery@2d9d6d0]
13:14 jebe@deploy1002: Finished deploy [analytics/refinery@2d9d6d0]: Regular analytics weekly train [analytics/refinery@2d9d6d0] (duration: 05m 52s)
13:08 jebe@deploy1002: Started deploy [analytics/refinery@2d9d6d0]: Regular analytics weekly train [analytics/refinery@2d9d6d0]
13:05 samtar@deploy1002: Started scap: Backport for Add namespace aliases to shwiki (T346588)
13:05 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
12:44 Emperor: ms-be20[60-73] swift package updates T346730
12:22 Emperor: ms-be20[49-59] swift package updates T346730
12:19 jebe@deploy1002: Finished deploy [analytics/refinery@91bb4a0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@91bb4a0] (duration: 02m 03s)
12:18 Emperor: ms-be2048 swift package updates T346730
12:17 jebe@deploy1002: Started deploy [analytics/refinery@91bb4a0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@91bb4a0]
12:17 jebe@deploy1002: Finished deploy [analytics/refinery@91bb4a0] (thin): Regular analytics weekly train THIN [analytics/refinery@91bb4a0] (duration: 00m 05s)
12:17 jebe@deploy1002: Started deploy [analytics/refinery@91bb4a0] (thin): Regular analytics weekly train THIN [analytics/refinery@91bb4a0]
12:14 Emperor: ms-be2047 swift package updates T346730
12:12 Emperor: ms-be204{5,6} swift package updates T346730
12:10 jebe@deploy1002: Finished deploy [analytics/refinery@91bb4a0]: Regular analytics weekly train [analytics/refinery@91bb4a0] (duration: 06m 53s)
12:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:03 jebe@deploy1002: Started deploy [analytics/refinery@91bb4a0]: Regular analytics weekly train [analytics/refinery@91bb4a0]
11:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
11:51 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
11:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
11:48 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52530 and previous config saved to /var/cache/conftool/dbconfig/20230919-112156-root.json
11:09 Emperor: eqiad swift front-end swift package updates T346730
11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52529 and previous config saved to /var/cache/conftool/dbconfig/20230919-110651-root.json
10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52528 and previous config saved to /var/cache/conftool/dbconfig/20230919-105147-root.json
10:38 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1148.eqiad.wmnet with OS bullseye
10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52527 and previous config saved to /var/cache/conftool/dbconfig/20230919-103642-root.json
10:34 Emperor: codfw swift front-end swift package updates T346730
10:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1147.eqiad.wmnet with OS bullseye
10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52526 and previous config saved to /var/cache/conftool/dbconfig/20230919-102137-root.json
10:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
10:11 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1148.eqiad.wmnet with reason: host reimage
10:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52525 and previous config saved to /var/cache/conftool/dbconfig/20230919-100632-root.json
10:01 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
09:58 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1148.eqiad.wmnet with OS bullseye
09:56 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1147.eqiad.wmnet with reason: host reimage
09:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 3%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52524 and previous config saved to /var/cache/conftool/dbconfig/20230919-095127-root.json
09:48 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm2001.wikimedia.org with OS bookworm
09:42 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1147.eqiad.wmnet with OS bullseye
09:40 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
09:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: Repooling after recloning db1128', diff saved to https://phabricator.wikimedia.org/P52523 and previous config saved to /var/cache/conftool/dbconfig/20230919-093622-root.json
09:12 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
09:08 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
09:03 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
08:59 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
08:47 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
08:44 godog: bounce benthos@webrequest_live to clear out old metrics
08:43 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
08:41 godog: remove MediaWiki.*.growthexperiments.taskcount.link_recommendation.* from graphite - T346371
08:39 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
08:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1146.eqiad.wmnet with OS bullseye
08:34 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-eqiad
08:30 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
08:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm2001.wikimedia.org with reason: host reimage
08:26 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:26 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
08:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm2001.wikimedia.org with reason: host reimage
08:26 brouberol: redeploying mw-page-content-change-enrich in codfw T336041
08:26 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:25 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
08:25 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling restart_daemons on A:maps-replica-codfw
08:25 brouberol: redeploying mw-page-content-change-enrich in eqiad T336041
08:24 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
08:24 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
08:24 brouberol: redeploying eventstreams-internal in eqiad T336041
08:23 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
08:23 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
08:23 brouberol: redeploying eventstreams-internal in codfw T336041
08:22 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
08:21 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
08:21 brouberol: redeploying eventstream-analytics-external in codfw T336041
08:21 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
08:20 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
08:20 brouberol: redeploying eventstream-analytics-external in eqiad T336041
08:19 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
08:18 brouberol@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
08:18 brouberol: redeploying eventstream-analytics in codfw T336041
08:18 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
08:17 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
08:13 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
08:11 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm2001.wikimedia.org with OS bookworm
08:10 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1146.eqiad.wmnet with reason: host reimage
08:05 brouberol@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
08:05 brouberol@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
08:05 moritzm: restarting FPM on mw canaries to pick up libwebp updates
08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
08:02 brouberol@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
08:02 brouberol@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
08:00 brouberol@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
07:59 brouberol@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
07:56 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1146.eqiad.wmnet with OS bullseye
07:51 moritzm: installing libwebp security updates on buster
07:51 moritzm: installing libwep security updates on buster
07:43 kartik@deploy1002: Finished scap: Backport for Disable Special:Contribute on bnwiki (T345772) (duration: 38m 49s)
07:27 kartik@deploy1002: kartik: Continuing with sync
07:26 kartik@deploy1002: kartik: Backport for Disable Special:Contribute on bnwiki (T345772) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:11 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
07:04 kartik@deploy1002: Started scap: Backport for Disable Special:Contribute on bnwiki (T345772)
06:35 denisse: updating PCC facts
06:09 XioNoX: push new pfw policy - T346705
05:48 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2004-dev.codfw.wmnet with OS bookworm
05:46 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P52522 and previous config saved to /var/cache/conftool/dbconfig/20230919-054539-root.json
04:35 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
04:06 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.25 (duration: 02m 10s)
04:03 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.27 refs T345888 (duration: 61m 05s)
03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.27 refs T345888
00:56 eileen: civicrm upgraded from 0a36997d to f0e9d3f6

2023-09-18

22:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs1004.eqiad.wmnet
22:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:08 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
22:07 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
21:59 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
21:51 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs1004.eqiad.wmnet
21:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs1003.eqiad.wmnet
21:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:49 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
21:45 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
21:40 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
21:19 maryum: Deployed patch for T344359
21:13 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs1003.eqiad.wmnet
20:49 cjming: end of UTC late backport window
20:36 cjming@deploy1002: Finished scap: Backport for Link recommendations: prevent too large offsets in cirrus queries (T345713) (duration: 11m 40s)
20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1008.eqiad.wmnet with OS bullseye
20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:29 cjming@deploy1002: urbanecm and cjming: Continuing with sync
20:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbstore1009.eqiad.wmnet with OS bullseye
20:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:26 cjming@deploy1002: urbanecm and cjming: Backport for Link recommendations: prevent too large offsets in cirrus queries (T345713) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:24 cjming@deploy1002: Started scap: Backport for Link recommendations: prevent too large offsets in cirrus queries (T345713)
20:24 cjming@deploy1002: Finished scap: Backport for clienthints: Enable purging of data on all wikis (T257893) (duration: 09m 24s)
20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:16 cjming@deploy1002: cjming and dreamyjazz: Continuing with sync
20:16 cjming@deploy1002: cjming and dreamyjazz: Backport for clienthints: Enable purging of data on all wikis (T257893) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1008.eqiad.wmnet with reason: host reimage
20:15 cjming@deploy1002: Started scap: Backport for clienthints: Enable purging of data on all wikis (T257893)
20:13 cjming@deploy1002: Finished scap: Backport for clienthints: Pin wgCheckUserDisplayClientHints to false (T337942) (duration: 08m 18s)
20:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1008.eqiad.wmnet with reason: host reimage
20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: host reimage
20:06 cjming@deploy1002: cjming and dreamyjazz: Continuing with sync
20:06 cjming@deploy1002: cjming and dreamyjazz: Backport for clienthints: Pin wgCheckUserDisplayClientHints to false (T337942) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1009.eqiad.wmnet with reason: host reimage
20:05 cjming@deploy1002: Started scap: Backport for clienthints: Pin wgCheckUserDisplayClientHints to false (T337942)
19:46 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
19:43 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
19:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host dbstore1009.eqiad.wmnet with OS bullseye
19:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host dbstore1008.eqiad.wmnet with OS bullseye
18:02 ejegg: re-enabled donor thank you mail send jobs
17:50 ejegg: civicrm upgraded from 0c2853aa to 0a36997d
17:48 ejegg: disabled donor thank you mail send jobs for Civi update
16:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1145.eqiad.wmnet with OS bullseye
16:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbstore1009']
16:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dbstore1008']
16:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1144.eqiad.wmnet with OS bullseye
16:24 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbstore1009']
16:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbstore1009']
16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbstore1009']
16:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbstore1008']
16:17 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
16:15 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1145.eqiad.wmnet with reason: host reimage
16:14 jnuche@deploy1002: Installation of scap version "4.61.1" completed for 601 hosts
16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1036.eqiad.wmnet with OS bullseye
16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:12 jnuche@deploy1002: Installing scap version "4.61.1" for 601 hosts
16:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:03 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1145.eqiad.wmnet with OS bullseye
16:01 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
15:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1047.eqiad.wmnet with OS bullseye
15:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:57 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1144.eqiad.wmnet with reason: host reimage
15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
15:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1036.eqiad.wmnet with reason: host reimage
15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1038.eqiad.wmnet with OS bullseye
15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:53 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 31s)
15:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
15:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:44 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 45s)
15:43 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1144.eqiad.wmnet with OS bullseye
15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
15:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1047.eqiad.wmnet with reason: host reimage
15:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1038.eqiad.wmnet with reason: host reimage
15:29 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1036
15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
15:28 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1036
15:27 Emperor: install new swift packages on ms-be2044
15:26 Emperor: repool ms-fe2009 with new swift packages
15:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1143.eqiad.wmnet with OS bullseye
15:18 Emperor: depool ms-fe2009 to install new swift packages
15:13 Emperor: upload swift_2.26.0-10+deb11u1+wmf1_amd64.changes to apt1001
15:11 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1142.eqiad.wmnet with OS bullseye
15:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
15:01 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1143.eqiad.wmnet with reason: host reimage
14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
14:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
14:47 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1143.eqiad.wmnet with OS bullseye
14:45 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1142.eqiad.wmnet with reason: host reimage
14:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
14:45 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
14:42 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1142.eqiad.wmnet with reason: host reimage
14:41 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
14:38 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
14:32 jelto: use certmanager instead of certgen in miscweb namespace - T300033
14:29 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
14:29 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1142.eqiad.wmnet with OS bullseye
14:26 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
14:24 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
14:21 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
14:20 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudservices2004-dev.codfw.wmnet with OS bookworm
14:18 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm
14:15 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
14:05 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
14:04 bblack: lvs1020, lvs1018: restarting pybal to re-enable healthchecks for wikireplicas ( T337446 -> https://gerrit.wikimedia.org/r/924508 )
14:01 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
14:01 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
14:01 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
13:57 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
13:56 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
13:51 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
13:47 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
13:46 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudbackup1001-dev.eqiad.wmnet with OS bookworm
13:38 godog: force-set max-repeaters to 20 for cr2-eqsin and cr3-eqsin - T346606
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
13:24 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
13:16 taavi@deploy1002: Finished scap: Backport for Disable UploadWizard CTA for MachineVision (T345187) (duration: 11m 16s)
13:11 vgutierrez: depool cp4052 for bookworm testing - T342154
13:09 taavi@deploy1002: taavi and cparle: Continuing with sync
13:06 taavi@deploy1002: taavi and cparle: Backport for Disable UploadWizard CTA for MachineVision (T345187) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:04 taavi@deploy1002: Started scap: Backport for Disable UploadWizard CTA for MachineVision (T345187)
13:04 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:04 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
13:04 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:03 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
13:02 godog: set max-repeaters to 30 for cr3-eqsin in librenms - T346606
13:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
12:48 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
12:47 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1141.eqiad.wmnet with OS bullseye
12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
12:32 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
12:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: furud.codfw.wmnet
12:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: furud.codfw.wmnet
12:24 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1141.eqiad.wmnet with reason: host reimage
12:24 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1140.eqiad.wmnet with OS bullseye
12:23 moritzm: installing libwebp security updates on bullseye
12:21 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1141.eqiad.wmnet with reason: host reimage
12:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
12:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
12:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1140.eqiad.wmnet with reason: host reimage
12:08 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1141.eqiad.wmnet with OS bullseye
12:07 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1140.eqiad.wmnet with reason: host reimage
12:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
11:54 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on A:maps-replica-eqiad
11:53 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1140.eqiad.wmnet with OS bullseye
11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudservices1005.wikimedia.org
11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:46 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1005 - aborrero@cumin1001"
11:45 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1005 - aborrero@cumin1001"
11:44 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:44 jayme: removed cergen certs from the list of trusted service account token signers on all kubernetes clusters - T329826
11:43 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:37 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1005.wikimedia.org
11:14 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on A:maps-replica-eqiad
11:13 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on P{maps201[0].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
11:05 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on P{maps201[0].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
11:01 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on P{maps200[7,8].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
10:48 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
10:46 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on P{maps200[7,8].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
10:44 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
10:44 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
10:42 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart-reboot (exit_code=0) rolling reboot on P{maps200[5,6].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
10:40 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
10:33 godog: set max-repeaters to 20 for cr3-eqsin using "force save" - T346606
10:28 jmm@cumin2002: START - Cookbook sre.maps.roll-restart-reboot rolling reboot on P{maps200[5,6].codfw.wmnet} and (A:maps-replica or A:maps-replica-codfw or A:maps-replica-eqiad)
09:59 elukey: remove ores-cache stream from changeprop (side effects - higher ORES client latencies, no mediawiki.revision-score event stream published) - https://phabricator.wikimedia.org/T342116
09:56 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
09:54 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
09:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
09:50 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
09:50 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
09:50 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
09:50 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
09:50 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
09:49 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
09:49 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
09:49 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
09:46 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
09:46 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
09:44 fabfur: enabled puppet on cp4050 for T346602
09:43 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
09:42 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
09:40 fabfur: disabled puppet on cp4050 for T346602
09:39 fabfur: enabled puppet on cp4052 for T346602
09:38 stevemunene@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
09:34 hashar@deploy1002: Finished scap: Backport for tests: Do not assume UTSysop exists (T346253) (duration: 09m 06s)
09:32 fabfur: disabled puppet on cp4052 for T346602
09:28 godog: set max-repeaters to 20 for cr3-eqsin in librenms - T346606
09:28 godog: set max-repeaters for cr3-eqsin in librenms - T346606
09:27 hashar@deploy1002: hashar and urbanecm: Continuing with sync
09:26 hashar@deploy1002: hashar and urbanecm: Backport for tests: Do not assume UTSysop exists (T346253) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
09:25 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:25 hashar@deploy1002: Started scap: Backport for tests: Do not assume UTSysop exists (T346253)
09:25 stevemunene@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
09:06 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
09:05 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
09:03 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
09:03 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
09:02 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:02 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
08:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
08:47 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
08:46 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
07:58 Amir1: running db checksum run in s3 eqiad replicas (T207253)
07:26 taavi@deploy1002: Finished scap: Backport for robots.txt: Disable indexing user (talk) pages and draft (talk) pages on shwiki (T346589) (duration: 22m 24s)
07:17 taavi@deploy1002: aleksandar and taavi: Continuing with sync
07:15 moritzm: installing clamav security updates
07:13 taavi@deploy1002: aleksandar and taavi: Backport for robots.txt: Disable indexing user (talk) pages and draft (talk) pages on shwiki (T346589) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:03 taavi@deploy1002: Started scap: Backport for robots.txt: Disable indexing user (talk) pages and draft (talk) pages on shwiki (T346589)

2023-09-16

13:52 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
13:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
13:52 akosiaris: re-enable changeprop
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
12:57 akosiaris: stop changeprop in eqiad
01:44 krinkle@deploy1002: Finished deploy [integration/docroot@9a1fb37]: (no justification provided) (duration: 00m 06s)
01:44 krinkle@deploy1002: Started deploy [integration/docroot@9a1fb37]: (no justification provided)

2023-09-15

21:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1047.eqiad.wmnet with OS bullseye
21:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
20:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
20:59 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
20:59 tzatziki: removing 6 files for legal compliance
20:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1038.eqiad.wmnet with OS bullseye
20:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
20:58 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1036.eqiad.wmnet with OS bullseye
20:58 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
17:56 urandom: stopping Cassandra bootstrap, restbase1030-a — T331713
17:43 urandom: initiate Cassandra bootstrap, restbase1030-a — T331713
17:11 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2002-dev.codfw.wmnet with OS bookworm
17:08 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bookworm
16:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
16:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
16:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
16:44 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
16:37 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
16:35 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
16:32 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
16:32 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
16:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2003-dev.codfw.wmnet with OS bookworm
16:12 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bookworm
16:09 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
15:51 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:51 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
15:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
15:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
15:50 claime: raising mw-api-int replicas to 12+2 to cope with wdqs backfill
15:43 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
15:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
15:42 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
15:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:42 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
15:42 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
15:41 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
15:39 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:33 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
15:32 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove old esams ranges and includes - cmooney@cumin1001"
15:30 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:24 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[18,25-27,33].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
15:03 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:03 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:57 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:38 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:38 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:35 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[18,25-27,33].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
14:35 urandom: rolling Cassandra restart, RESTBase/eqiad/row-D — T331713
14:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
14:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
14:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
14:27 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2006-dev
14:27 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2006-dev
14:26 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2005-dev
14:26 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2005-dev
14:25 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt2004-dev
14:24 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt2004-dev
14:06 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2444.codfw.wmnet
14:05 claime: repooling mw2444.codfw.wmnet - T345884
13:51 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
13:47 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
13:46 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
13:41 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
13:41 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm-test1001.wikimedia.org with OS bookworm
13:40 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
13:38 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
13:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
13:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
13:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
13:35 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
13:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
13:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
13:19 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
13:19 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
13:16 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
13:03 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm-test1001.wikimedia.org with OS bookworm
13:01 akosiaris@deploy1002: Synchronized docroot: (no justification provided) (duration: 08m 20s)
12:50 topranks: changing ECMP hasing algorithm on drmrs, esams and cloud switches T339852
12:27 topranks: changing ECMP hasing algorithm on asw1-b12-drmrs T339852
11:54 _joe_: updated etcd-mirror to 0.0.10 everywhere
11:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1138.eqiad.wmnet with OS bullseye
11:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1138.eqiad.wmnet with reason: host reimage
11:09 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1138.eqiad.wmnet with reason: host reimage
10:56 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1138.eqiad.wmnet with OS bullseye
10:07 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:07 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts in codfw - aborrero@cumin1001"
09:22 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts in codfw - aborrero@cumin1001"
09:20 aborrero@cumin1001: START - Cookbook sre.dns.netbox
09:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
09:10 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
08:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica2008.wikimedia.org
08:57 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ldap-replica2008.wikimedia.org with OS bookworm
08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica2008.wikimedia.org with reason: host reimage
08:47 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
08:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica2008.wikimedia.org with reason: host reimage
08:46 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
08:40 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
08:39 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
08:39 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
08:39 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica2008.wikimedia.org with OS bookworm
08:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
08:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica2008.wikimedia.org on all recursors
08:26 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica2008.wikimedia.org on all recursors
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
08:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2008.wikimedia.org - jmm@cumin2002"
08:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:10 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica2008.wikimedia.org
08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica2007.wikimedia.org
08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-replica2007.wikimedia.org with OS bookworm
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica2007.wikimedia.org with reason: host reimage
07:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica2007.wikimedia.org with reason: host reimage
07:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica2007.wikimedia.org with OS bookworm
07:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
07:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
07:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica2007.wikimedia.org on all recursors
07:25 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica2007.wikimedia.org on all recursors
07:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
07:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica2007.wikimedia.org - jmm@cumin2002"
07:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:22 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica2007.wikimedia.org
07:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
07:21 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
07:04 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
07:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
06:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
06:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
06:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
06:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
05:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
05:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5004.wikimedia.org
05:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5004.wikimedia.org
02:43 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[17,22-24,29,32].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
01:44 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17,22-24,29,32].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
01:44 urandom: rolling Cassandra restart, RESTBase/eqiad/row-B — T331713
01:20 krinkle@deploy1002: Finished scap: Backport for Remove old origin-with-crossorigin referrer policy (T338183) (duration: 08m 16s)
01:14 krinkle@deploy1002: krinkle and hartman: Continuing with sync
01:13 krinkle@deploy1002: krinkle and hartman: Backport for Remove old origin-with-crossorigin referrer policy (T338183) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
01:12 krinkle@deploy1002: Started scap: Backport for Remove old origin-with-crossorigin referrer policy (T338183)
01:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
01:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
01:09 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
00:12 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[16,19-21,28,31].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001

2023-09-14

23:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1056.eqiad.wmnet with OS bullseye
23:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
23:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
23:30 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
23:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
23:27 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
23:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
23:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
23:24 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
23:19 eileen: civicrm upgraded from 9d34ed9b to 0c2853aa - big vendor update - roll back if issues
23:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
23:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
23:15 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
23:13 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[16,19-21,28,31].eqiad.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
23:12 urandom: rolling Cassandra restart, RESTBase/eqiad/row-A — T331713
23:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
23:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1056.eqiad.wmnet with OS bullseye
23:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
23:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
23:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1031.eqiad.wmnet with OS bullseye
23:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
22:47 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1030.eqiad.wmnet with OS bullseye
22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1034.eqiad.wmnet with OS bullseye
22:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:44 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2001-dev.codfw.wmnet with OS bookworm
22:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
22:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
22:21 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[12,17-18,23,26-27].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
22:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1034.eqiad.wmnet with reason: host reimage
22:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1031.eqiad.wmnet with reason: host reimage
22:20 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1030.eqiad.wmnet with reason: host reimage
22:14 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2001-dev.codfw.wmnet with reason: host reimage
22:11 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2001-dev.codfw.wmnet with reason: host reimage
22:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
22:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1031.eqiad.wmnet with OS bullseye
22:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1030.eqiad.wmnet with OS bullseye
21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1032.eqiad.wmnet with OS bullseye
21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bookworm
21:50 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
21:42 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1035.eqiad.wmnet with OS bullseye
21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:41 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1039.eqiad.wmnet with OS bullseye
21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1037.eqiad.wmnet with OS bullseye
21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
21:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1034.eqiad.wmnet with OS bullseye
21:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
21:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1032.eqiad.wmnet with reason: host reimage
21:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1034.eqiad.wmnet with OS bullseye
21:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
21:34 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
21:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1034.eqiad.wmnet with OS bullseye
21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1033.eqiad.wmnet with OS bullseye
21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:27 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[12,17-18,23,26-27].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
21:26 urandom: rolling Cassandra restart, RESTBase/row-D — T331713
21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
21:24 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
21:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
21:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1039.eqiad.wmnet with reason: host reimage
21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
21:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1037.eqiad.wmnet with reason: host reimage
21:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1035.eqiad.wmnet with reason: host reimage
21:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1033.eqiad.wmnet with reason: host reimage
21:13 ryankemper: T345475 Beginning process to bring 3 new hosts `wdqs202[3-5]` into service. Merged https://gerrit.wikimedia.org/r/957802 and running puppet on hosts
21:06 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1039.eqiad.wmnet with OS bullseye
21:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1038.eqiad.wmnet with OS bullseye
21:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1037.eqiad.wmnet with OS bullseye
21:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1036.eqiad.wmnet with OS bullseye
21:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1035.eqiad.wmnet with OS bullseye
20:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1034.eqiad.wmnet with OS bullseye
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1033.eqiad.wmnet with OS bullseye
20:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1032.eqiad.wmnet with OS bullseye
20:47 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
20:45 thcipriani@deploy1002: Finished scap: Backport for Don't offer visual diffs for non-wikitext pages (T346252), ThreadItemStore: Add details to row insertion exceptions (T343859) (duration: 12m 35s)
20:38 thcipriani@deploy1002: thcipriani and matmarex: Continuing with sync
20:34 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
20:34 thcipriani@deploy1002: thcipriani and matmarex: Backport for Don't offer visual diffs for non-wikitext pages (T346252), ThreadItemStore: Add details to row insertion exceptions (T343859) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD
20:32 thcipriani@deploy1002: Started scap: Backport for Don't offer visual diffs for non-wikitext pages (T346252), ThreadItemStore: Add details to row insertion exceptions (T343859)
20:20 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
20:20 urandom: rolling Cassandra restart, RESTBase/row-C — T331713
20:05 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[13-14,19,21,24].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
19:20 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13-14,19,21,24].codfw.wmnet: Maybe pickup missed topology changes — T331713 - eevans@cumin1001
19:20 urandom: rolling Cassandra restart, RESTBase/row-B — T331713
19:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1050.eqiad.wmnet with OS bullseye
19:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
19:10 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1051.eqiad.wmnet with OS bullseye
18:58 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
18:58 urandom: initiating `removenode`, ID=627fe8e9-d298-43b3-a1a2-7c8a3f01370b (restbase1030-c) — T331713
18:56 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
18:54 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
18:53 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
18:52 urandom: stopping bootstrap of restbase1030-c — T331713
18:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
18:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
18:45 urandom: retrying Cassandra bootstrap of restbase1030-c — T331713
18:39 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1050.eqiad.wmnet with OS bullseye
18:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1050.eqiad.wmnet with OS bullseye
18:38 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:37 cmooney@cumin1001: START - Cookbook sre.dns.netbox
18:35 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
18:35 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1051.eqiad.wmnet with OS bullseye
18:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1051.eqiad.wmnet with OS bullseye
18:34 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:27 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@7160e27]: Deploy latest DAGs to analytics Airflow instance T340861 (duration: 00m 40s)
18:27 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@7160e27]: Deploy latest DAGs to analytics Airflow instance T340861
18:24 bblack: cp107[56],cp202[78],cp600[19]: (one host from each cluster, at 3 sites): restarting varnish-frontend spaced out over the next ~hour for memory tweaks.
18:18 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1053.eqiad.wmnet with OS bullseye
18:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1046.eqiad.wmnet with OS bullseye
18:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
18:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:59 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1045.eqiad.wmnet with OS bullseye
17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
17:44 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1053.eqiad.wmnet with OS bullseye
17:43 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1046.eqiad.wmnet with reason: host reimage
17:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1046.eqiad.wmnet with OS bullseye
17:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1048.eqiad.wmnet with OS bullseye
17:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1049.eqiad.wmnet with OS bullseye
17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
17:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1045.eqiad.wmnet with reason: host reimage
17:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1044.eqiad.wmnet with OS bullseye
17:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1043.eqiad.wmnet with OS bullseye
17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1046.eqiad.wmnet with OS bullseye
17:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
17:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
17:20 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
17:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1049.eqiad.wmnet with reason: host reimage
17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
17:16 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
17:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1048.eqiad.wmnet with reason: host reimage
17:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
17:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1044.eqiad.wmnet with reason: host reimage
17:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1043.eqiad.wmnet with reason: host reimage
17:05 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on search-loader2002.codfw.wmnet,search-loader1002.eqiad.wmnet with reason: T346039
17:05 bking@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on search-loader2002.codfw.wmnet,search-loader1002.eqiad.wmnet with reason: T346039
17:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1049.eqiad.wmnet with OS bullseye
17:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1048.eqiad.wmnet with OS bullseye
17:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1047.eqiad.wmnet with OS bullseye
17:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
17:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1052.eqiad.wmnet with OS bullseye
17:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1046.eqiad.wmnet with OS bullseye
17:00 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1045.eqiad.wmnet with OS bullseye
16:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1044.eqiad.wmnet with OS bullseye
16:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1043.eqiad.wmnet with OS bullseye
16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1040.eqiad.wmnet with OS bullseye
16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1041.eqiad.wmnet with OS bullseye
16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1042.eqiad.wmnet with OS bullseye
16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:47 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
16:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
16:31 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1040.eqiad.wmnet with reason: host reimage
16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1040.eqiad.wmnet with OS bullseye
16:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1041.eqiad.wmnet with reason: host reimage
16:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1042.eqiad.wmnet with reason: host reimage
16:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1040.eqiad.wmnet with OS bullseye
16:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1056.eqiad.wmnet with OS bullseye
16:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:23 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:21 denisse: Failing over from netmon2002 (codfw) to netmon1003 (eqiad).
16:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:17 volans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update - volans@cumin1001"
16:17 volans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update - volans@cumin1001"
16:16 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
16:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1042.eqiad.wmnet with OS bullseye
16:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1041.eqiad.wmnet with OS bullseye
16:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes1040.eqiad.wmnet with OS bullseye
16:13 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
16:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
16:12 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1053.eqiad.wmnet with OS bullseye
16:12 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:09 cmooney@cumin1001: START - Cookbook sre.dns.netbox
16:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
16:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
16:04 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
16:04 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "failed in reimage script said manually run it - robh@cumin1001 - T342533"
16:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1030.eqiad.wmnet with OS bullseye
16:03 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "failed in reimage script said manually run it - robh@cumin1001 - T342533"
16:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1031.eqiad.wmnet with OS bullseye
16:03 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1051.eqiad.wmnet with reason: host reimage
16:03 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1052.eqiad.wmnet with reason: host reimage
16:01 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:01 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1050.eqiad.wmnet with reason: host reimage
16:00 jclark@cumin1001: START - Cookbook sre.dns.netbox
15:57 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1056.eqiad.wmnet with OS bullseye
15:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1055.eqiad.wmnet with OS bullseye
15:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:55 urbanecm@deploy1002: Finished scap: Backport for listTaskCounts: Push total task counts to statsd for all tasks (T345204), linkTaskCounts: Stop producing per-topic statsd data (T345210) (duration: 07m 37s)
15:55 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1054.eqiad.wmnet with OS bullseye
15:55 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1056.eqiad.wmnet with OS bullseye
15:54 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:54 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:53 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1139.eqiad.wmnet with OS bullseye
15:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2006.codfw.wmnet
15:52 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2006.codfw.wmnet
15:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
15:52 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
15:52 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2005.codfw.wmnet
15:51 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2005.codfw.wmnet
15:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bullseye
15:49 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
15:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1051.eqiad.wmnet with OS bullseye
15:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1052.eqiad.wmnet with OS bullseye
15:48 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1050.eqiad.wmnet with OS bullseye
15:48 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host search-loader1002.eqiad.wmnet
15:47 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host search-loader1002.eqiad.wmnet with OS bullseye
15:47 urbanecm@deploy1002: Started scap: Backport for listTaskCounts: Push total task counts to statsd for all tasks (T345204), linkTaskCounts: Stop producing per-topic statsd data (T345210)
15:46 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1053.eqiad.wmnet with reason: host reimage
15:44 jayme: restarting primary lvs in codfw, eqsin, ulsfo
15:42 jayme: restarting secondary lvs in codfw, eqsin, ulsfo
15:38 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1054.eqiad.wmnet with reason: host reimage
15:37 jayme: running puppet on lvs[2011-2014].codfw.wmnet,lvs[5004-5006].eqsin.wmnet,lvs[4008-4010].ulsfo.wmnet
15:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1055.eqiad.wmnet with reason: host reimage
15:36 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host search-loader2002.codfw.wmnet
15:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host search-loader2002.codfw.wmnet with OS bullseye
15:36 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1056.eqiad.wmnet with reason: host reimage
15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testreduce1002.eqiad.wmnet
15:01 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
15:01 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM search-loader1002.eqiad.wmnet - bking@cumin1001"
15:00 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf2005.codfw.wmnet with OS bullseye
14:58 bking@cumin1001: START - Cookbook sre.dns.netbox
14:58 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) search-loader1002.eqiad.wmnet on all recursors
14:58 bking@cumin1001: START - Cookbook sre.dns.wipe-cache search-loader1002.eqiad.wmnet on all recursors
14:58 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:58 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader1002.eqiad.wmnet - bking@cumin1001"
14:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2002.codfw.wmnet
14:55 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader1002.eqiad.wmnet - bking@cumin1001"
14:55 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:55 bking@cumin1001: START - Cookbook sre.hosts.reimage for host search-loader2002.codfw.wmnet with OS bullseye
14:54 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2002.codfw.wmnet
14:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2005.codfw.wmnet
14:52 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM search-loader2002.codfw.wmnet - bking@cumin1001"
14:52 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM search-loader2002.codfw.wmnet - bking@cumin1001"
14:51 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) search-loader2002.codfw.wmnet on all recursors
14:51 bking@cumin1001: START - Cookbook sre.dns.wipe-cache search-loader2002.codfw.wmnet on all recursors
14:51 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:51 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader2002.codfw.wmnet - bking@cumin1001"
14:51 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.wikimedia.org with reason: test before full decom
14:51 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.wikimedia.org with reason: test before full decom
14:50 bking@cumin1001: START - Cookbook sre.dns.netbox
14:50 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host search-loader1002.eqiad.wmnet
14:50 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM search-loader2002.codfw.wmnet - bking@cumin1001"
14:50 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2005.codfw.wmnet
14:47 bking@cumin1001: START - Cookbook sre.dns.netbox
14:47 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host search-loader2002.codfw.wmnet
14:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1029.eqiad.wmnet with reason: host reimage
14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testvm2004.codfw.wmnet
14:44 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1028.eqiad.wmnet with reason: host reimage
14:43 vgutierrez: varnish: decrease max_connections to 10k per backend server globally
14:41 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM testvm2004.codfw.wmnet
14:41 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1029.eqiad.wmnet with reason: host reimage
14:41 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1028.eqiad.wmnet with reason: host reimage
14:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1027.eqiad.wmnet with reason: host reimage
14:37 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1027.eqiad.wmnet with reason: host reimage
14:36 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf2005.codfw.wmnet with reason: host reimage
14:33 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2005.codfw.wmnet with reason: host reimage
14:32 moritzm: installing qemu security updates on ganeti-test cluster
14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1031.eqiad.wmnet with OS bullseye
14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1030.eqiad.wmnet with OS bullseye
14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1029.eqiad.wmnet with OS bullseye
14:27 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1028.eqiad.wmnet with OS bullseye
14:23 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1027.eqiad.wmnet with OS bullseye
14:19 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
14:18 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
14:18 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
14:18 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host conf2005.codfw.wmnet with OS bullseye
14:17 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
14:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
14:16 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
13:58 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf2006.codfw.wmnet with OS bullseye
13:57 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1138.eqiad.wmnet with OS bullseye
13:56 filippo@deploy1002: Finished deploy [librenms/librenms@f049593]: (no justification provided) (duration: 00m 11s)
13:55 filippo@deploy1002: Started deploy [librenms/librenms@f049593]: (no justification provided)
13:39 godog: issue test alertmanager librenms alert - T346318
13:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf2006.codfw.wmnet with reason: host reimage
13:35 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2006.codfw.wmnet with reason: host reimage
13:32 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
13:31 moritzm: installing libwebp security updates on bookworm
13:28 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1139.eqiad.wmnet with reason: host reimage
13:28 slyngshede@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
13:25 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1139.eqiad.wmnet with reason: host reimage
13:19 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host conf2006.codfw.wmnet with OS bullseye
13:14 moritzm: installing aom security updates
13:13 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for conf2004.codfw.wmnet
13:13 jayme@cumin1001: START - Cookbook sre.hosts.remove-downtime for conf2004.codfw.wmnet
13:12 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1139.eqiad.wmnet with OS bullseye
13:10 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1138.eqiad.wmnet with OS bullseye
12:56 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idm-test1001.wikimedia.org with OS bookworm
12:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
12:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync
12:16 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: sync
12:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
12:11 hnowlan@cumin1001: END (FAIL) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=1) rolling restart_daemons on A:restbase-canary
12:10 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
12:06 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on idm-test1001.wikimedia.org with reason: host reimage
12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
12:03 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host conf2004.codfw.wmnet with OS bullseye
12:01 hnowlan@cumin1001: START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-canary
11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
11:54 slyngshede@cumin1001: START - Cookbook sre.hosts.reimage for host idm-test1001.wikimedia.org with OS bookworm
11:49 hnowlan@deploy1002: Finished deploy [restbase/deploy@8eb62f2]: Revert "Disable wikifeeds announcements healthcheck" (duration: 06m 12s)
11:43 hnowlan@deploy1002: Started deploy [restbase/deploy@8eb62f2]: Revert "Disable wikifeeds announcements healthcheck"
11:37 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
11:35 hnowlan@deploy1002: Finished deploy [restbase/deploy@e8a6ae4]: Disable wikifeeds announcements healthcheck (duration: 10m 08s)
11:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on idm-test1001.wikimedia.org with reason: upgrade to Bookwork
11:34 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on idm-test1001.wikimedia.org with reason: upgrade to Bookwork
11:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
11:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3003.esams.wmnet
11:25 hnowlan@deploy1002: Started deploy [restbase/deploy@e8a6ae4]: Disable wikifeeds announcements healthcheck
11:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3003.esams.wmnet
11:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
11:21 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
11:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
11:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
11:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
11:12 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
11:08 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1137.eqiad.wmnet with OS bullseye
11:04 brouberol: brouberol@cumin2002 START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch - T344798
11:02 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
10:43 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1137.eqiad.wmnet with reason: host reimage
10:41 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1137.eqiad.wmnet with reason: host reimage
10:28 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on conf2004.codfw.wmnet with reason: host reimage
10:27 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1137.eqiad.wmnet with OS bullseye
10:25 jayme@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on conf2004.codfw.wmnet with reason: host reimage
10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
10:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
10:18 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-eqiad
10:10 jayme@cumin1001: START - Cookbook sre.hosts.reimage for host conf2004.codfw.wmnet with OS bullseye
10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1006.wikimedia.org
10:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
10:06 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: sync
10:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1006.wikimedia.org
10:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-replica1005.wikimedia.org
09:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-replica1005.wikimedia.org
09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin1001.eqiad.wmnet
09:52 elukey: remove the 'mediawiki.revision-score' stream form eventstreams public API - T342116
09:51 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: sync
09:51 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: sync
09:50 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: sync
09:49 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: sync
09:49 jayme: restarted navtiming on webperf2003 to pick up changed etcd service records
09:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1001.eqiad.wmnet
09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
09:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
09:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
09:22 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
09:17 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
09:16 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
09:10 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
09:07 moritzm: installing qemu security updates on ganeti-test
08:59 btullis: running build-production-images on build2001 for T344910
08:53 godog: +50 to prometheus eqiad k8s-staging
08:45 jayme: restarting confd fleet wide
08:45 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-eqiad
08:43 jayme: restarting primary lvs in codfw, eqsin, ulsfo
08:38 jayme: restarted secondary lvs in codfw, eqsin, ulsfo
08:24 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.26 refs T343728
07:57 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.eqsin.wmnet on all recursors
07:56 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.eqsin.wmnet on all recursors
07:56 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.ulsfo.wmnet on all recursors
07:56 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.ulsfo.wmnet on all recursors
07:56 jayme@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) _etcd._tcp.codfw.wmnet on all recursors
07:56 jayme@cumin1001: START - Cookbook sre.dns.wipe-cache _etcd._tcp.codfw.wmnet on all recursors
07:44 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host debmonitor2003.codfw.wmnet
07:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
07:32 hashar: Backport & config deployment window completed.
07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
07:13 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service on MediaWiki - rollout #4 (T341445) (duration: 10m 17s)
07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt1001.wikimedia.org
07:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt1001.wikimedia.org
07:06 kartik@deploy1002: abi and kartik: Continuing with sync
07:04 kartik@deploy1002: abi and kartik: Backport for Enable MinT translation service on MediaWiki - rollout #4 (T341445) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:02 kartik@deploy1002: Started scap: Backport for Enable MinT translation service on MediaWiki - rollout #4 (T341445)
06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Pre swichover tasks
06:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Pre swichover tasks
06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Pre swichover tasks
06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Pre swichover tasks
05:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Pre swichover tasks
05:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Pre swichover tasks
05:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Pre swichover tasks
05:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Pre swichover tasks
05:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 11 hosts with reason: Pre swichover tasks
05:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 11 hosts with reason: Pre swichover tasks
05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2013.codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Pre swichover tasks
05:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2013.codfw.wmnet,pc[1013-1014].eqiad.wmnet with reason: Pre swichover tasks
05:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Pre swichover tasks
05:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Pre swichover tasks
05:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc[2011,2014].codfw.wmnet,pc1011.eqiad.wmnet with reason: Pre swichover tasks
05:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on pc[2011,2014].codfw.wmnet,pc1011.eqiad.wmnet with reason: Pre swichover tasks
03:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
03:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
03:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
02:58 rzl@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
02:58 rzl@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
02:57 rzl@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
02:56 rzl@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
02:54 rzl@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
01:36 urandom: starting RESTBase/Cassandra node rebuilds, cassandra-c/row D — T331713

2023-09-13

23:06 urandom: starting Cassandra node rebuilds, restbase/row D — T331713
22:57 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS bookworm
21:50 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1128.eqiad.wmnet with reason: HW issues
21:50 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1128.eqiad.wmnet with reason: HW issues
21:50 denisse: downtiming db1128
21:49 denisse@cumin1001: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P52504 and previous config saved to /var/cache/conftool/dbconfig/20230913-214930-denisse.json
21:48 denisse: depooling db1128
21:35 bking@deploy1002: Finished deploy [wdqs/wdqs@3e0a913]: 0.3.129 use allowlist T344284 (duration: 11m 27s)
21:28 eileen: civicrm upgraded from 6b247288 to 9d34ed9b
21:24 bking@deploy1002: Started deploy [wdqs/wdqs@3e0a913]: 0.3.129 use allowlist T344284
21:22 bking@deploy1002: Finished deploy [wdqs/wdqs@16e3dcf]: 0.3.129 use allowlist T344284 (duration: 00m 59s)
21:21 bking@deploy1002: Started deploy [wdqs/wdqs@16e3dcf]: 0.3.129 use allowlist T344284
19:44 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon1003.wikimedia.org with OS bookworm
19:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
19:40 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
19:34 eileen: civicrm upgraded from 80aee570 to 6b247288
19:24 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
19:21 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon1003.wikimedia.org with reason: host reimage
19:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
19:09 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon1003.wikimedia.org with OS bookworm
19:09 urandom: initiating rebuild of restbase1027-a & restbase1033-a
19:08 urandom: initiating rebuild of restbase1026-a
19:00 urandom: initiating rebuild of restbase1025-a
18:51 urandom: initiating rebuild of restbase1018-a
18:49 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4052.ulsfo.wmnet with OS bookworm
18:42 urandom: stopping bootstrap of restbase1030-c — T331713
18:38 godog: run schema migrations for librenms on m1 (backdated, started ~1h ago)
18:33 urandom: restarting restbase service (restbase1031) — T331713
18:19 urandom: resuming bootstrap of restbase1030-c —
18:05 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2006-dev.codfw.wmnet with OS bookworm
17:45 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
17:42 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2006-dev.codfw.wmnet with reason: host reimage
17:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS bookworm
17:22 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2006-dev.codfw.wmnet with OS bookworm
16:34 denisse@deploy1002: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.8.2 - T344136 (duration: 00m 16s)
16:34 denisse@deploy1002: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 23.8.2 - T344136
16:04 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on vrts1002.eqiad.wmnet with reason: Testing
16:04 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on vrts1002.eqiad.wmnet with reason: Testing
16:04 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bookworm
15:41 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
15:38 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
15:34 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
15:34 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
15:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
15:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
15:26 jayme: re-enabled puppet on all k8s control planes
15:26 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling reboot on A:aqs-codfw
15:19 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bookworm
15:19 denisse: Start reimage of netmon2002
15:17 denisse: Starting LibreNMS upgrade in codfw.
15:14 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
15:04 jayme: stopped puppet on all k8s control planes for 956842 rollout
15:01 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
15:01 hnowlan: repooling cp2037 and enabling puppet on A:cp
14:56 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
14:55 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
14:52 hnowlan: disable puppet on A:cp
14:51 hnowlan: depooled service=ats-be,name=cp2037.codfw.wmnet
14:51 jayme: updated kubernetes-* packages fleet wide to 1.23.14-3 - T329826
14:50 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
14:41 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
14:39 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
14:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP towards puppetised nftables firewall
14:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on sretest1001.eqiad.wmnet with reason: WIP towards puppetised nftables firewall
14:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
14:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:25 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:17 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:17 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:10 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:10 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
14:08 hnowlan: stopping cassandra on restbase1030-c
13:52 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling reboot on A:aqs-codfw
13:34 Lucas_WMDE: UTC afternoon backport+config window done
13:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset (T318272) (duration: 15m 42s)
13:27 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and d3r1ck01: Continuing with sync
13:20 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and d3r1ck01: Backport for rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset (T318272) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:18 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for rdbms: Use `debugSql` instead of `debugDumpSql` which is unuset (T318272)
12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52499 and previous config saved to /var/cache/conftool/dbconfig/20230913-122323-ladsgroup.json
12:17 godog: pool only titan hosts for thanos-web and thanos-query services - T341488
12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52498 and previous config saved to /var/cache/conftool/dbconfig/20230913-120818-ladsgroup.json
11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52497 and previous config saved to /var/cache/conftool/dbconfig/20230913-115314-ladsgroup.json
11:30 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
11:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
11:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:18 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52495 and previous config saved to /var/cache/conftool/dbconfig/20230913-111834-arnaudb.json
11:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:15 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:15 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
11:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2002.codfw.wmnet with OS bookworm
10:54 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
10:51 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2002.codfw.wmnet with reason: host reimage
10:49 jayme: imported kubernetes_1.23.14-3 to bullseye-wikimedia component/kubernetes123 - T329826
10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
10:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2105.codfw.wmnet with reason: Maintenance
10:40 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1002.eqiad.wmnet with OS bookworm
10:34 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
10:34 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host titan2002.codfw.wmnet with OS bookworm
10:29 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
10:28 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:28 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:27 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
10:26 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host titan2002.codfw.wmnet with OS bookworm
10:24 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
10:21 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
10:11 claime: set/pooled=no; selector: name=mw2444.codfw.wmnet - T345884
10:10 cgoubert@cumin1001: conftool action : set/pooled=no; selector: name=mw2444.codfw.wmnet
10:10 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
10:06 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
10:06 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
10:06 aklapper@deploy1002: Finished scap: Backport for Revert "EntityId: Hard-deprecate Serializable methods" (duration: 08m 49s)
10:06 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1002.eqiad.wmnet with OS bookworm
10:06 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan2002.codfw.wmnet with OS bookworm
09:59 aklapper@deploy1002: aklapper and jnuche: Continuing with sync
09:59 aklapper@deploy1002: aklapper and jnuche: Backport for Revert "EntityId: Hard-deprecate Serializable methods" synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
09:57 aklapper@deploy1002: Started scap: Backport for Revert "EntityId: Hard-deprecate Serializable methods"
09:51 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:48 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
09:35 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
09:35 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
09:34 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
09:34 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
09:16 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
09:14 aklapper@deploy1002: backport Cancelled
09:14 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2002.codfw.wmnet with OS bookworm
09:11 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan2001.codfw.wmnet with OS bookworm
08:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
08:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
08:47 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1001.eqiad.wmnet with OS bookworm
08:46 claime: Running puppet on cp-text P:trafficserver::backend - T290536
08:33 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
08:30 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan2001.codfw.wmnet with reason: host reimage
08:25 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
08:25 aklapper@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.26 refs T343728 (duration: 06m 46s)
08:22 filippo@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
08:18 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.26 refs T343728
08:14 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2001.codfw.wmnet with OS bookworm
08:14 filippo@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host titan2001.codfw.wmnet with OS bookworm
08:08 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
08:07 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1001.eqiad.wmnet with OS bookworm
08:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
07:56 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
07:54 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
07:53 vgutierrez: repool cp1075 && cp1076
07:51 filippo@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1001.eqiad.wmnet with OS bookworm
07:51 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan2001.codfw.wmnet,service=thanos-web
07:46 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan2001.codfw.wmnet with OS bookworm
07:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T343198)', diff saved to https://phabricator.wikimedia.org/P52491 and previous config saved to /var/cache/conftool/dbconfig/20230913-074602-arnaudb.json
07:45 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
07:45 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
07:44 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.codfw.wmnet,service=thanos-web
07:43 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe2004.codfdw.wmnet,service=thanos-web
07:43 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan2001.codfdw.wmnet,service=thanos-web
07:43 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1001.eqiad.wmnet,service=thanos-web
07:43 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe1004.eqiad.wmnet,service=thanos-web
07:42 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
07:39 filippo@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
06:06 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Running again following connection refused errors from kubemaster (duration: 07m 24s)
05:55 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable source maps on group0 wikis attempt 2 (duration: 07m 37s)
05:40 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Enable source maps on group0 wikis T47514 (duration: 07m 14s)
05:15 tstarling@deploy1002: Synchronized wmf-config/etcd.php: Remove PHP 7.2 fallback for array_key_first g 956364 (duration: 07m 03s)
04:35 hmonroy@deploy1002: Finished scap: Backport for Do not enable entire OOUI in PHP on page load (T345414) (duration: 07m 58s)
04:29 hmonroy@deploy1002: hmonroy and jdlrobson: Continuing with sync
04:29 hmonroy@deploy1002: hmonroy and jdlrobson: Backport for Do not enable entire OOUI in PHP on page load (T345414) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
04:27 hmonroy@deploy1002: Started scap: Backport for Do not enable entire OOUI in PHP on page load (T345414)
04:26 hmonroy@deploy1002: Finished scap: Backport for Do not enable entire OOUI in PHP on page load (T345414) (duration: 09m 56s)
04:19 hmonroy@deploy1002: hmonroy and jdlrobson: Continuing with sync
04:17 hmonroy@deploy1002: hmonroy and jdlrobson: Backport for Do not enable entire OOUI in PHP on page load (T345414) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
04:16 hmonroy@deploy1002: Started scap: Backport for Do not enable entire OOUI in PHP on page load (T345414)

2023-09-12

23:14 brett: Upload trafficserver_9.2.1-1wm2_amd64 to bookworm-wikimedia
23:09 eileen: config revision changed from 2efd8142 to eb7931ca add is_create_activities to bounce fetch job
21:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
21:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
21:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52486 and previous config saved to /var/cache/conftool/dbconfig/20230912-211128-arnaudb.json
21:08 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1001.eqiad.wmnet
21:04 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader1001.eqiad.wmnet
20:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P52485 and previous config saved to /var/cache/conftool/dbconfig/20230912-205621-arnaudb.json
20:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2001.codfw.wmnet
20:43 cjming: end of UTC late backport window
20:43 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host search-loader2001.codfw.wmnet
20:42 inflatador: rebooting search-loader2001.codfw.wmnet T344671
20:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P52484 and previous config saved to /var/cache/conftool/dbconfig/20230912-204115-arnaudb.json
20:39 cjming@deploy1002: Finished scap: Backport for Make the new stream name consistent with convention (duration: 09m 24s)
20:33 cjming@deploy1002: sharvaniharan and cjming: Continuing with sync
20:31 cjming@deploy1002: sharvaniharan and cjming: Backport for Make the new stream name consistent with convention synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:30 cjming@deploy1002: Started scap: Backport for Make the new stream name consistent with convention
20:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52483 and previous config saved to /var/cache/conftool/dbconfig/20230912-202609-arnaudb.json
20:25 cjming@deploy1002: Finished scap: Backport for Reduce initial payload of Phonos styles (T345414) (duration: 12m 06s)
20:22 eileen: civicrm upgraded from 5b7b2b3e to 80aee570
20:19 cjming@deploy1002: cjming and samtar: Continuing with sync
20:15 cjming@deploy1002: cjming and samtar: Backport for Reduce initial payload of Phonos styles (T345414) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:13 cjming@deploy1002: Started scap: Backport for Reduce initial payload of Phonos styles (T345414)
19:43 eileen: civicrm upgraded from 771fcde3 to 5b7b2b3e
19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ssw1 old irb int dns - cmooney@cumin1001"
19:28 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ssw1 old irb int dns - cmooney@cumin1001"
19:23 cmooney@cumin1001: START - Cookbook sre.dns.netbox
19:17 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:15 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:50 sukhe: run authdns-update to remove nsa.wikimedia.org
16:28 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2005-dev.codfw.wmnet with OS bookworm
15:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1004.eqiad.wmnet
15:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1004.eqiad.wmnet
15:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2003.codfw.wmnet
15:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2003.codfw.wmnet
15:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
15:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
15:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
15:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
15:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
15:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
15:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
15:13 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1056.eqiad.wmnet']
15:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
15:09 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1055.eqiad.wmnet']
15:01 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1056.eqiad.wmnet']
15:00 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1055.eqiad.wmnet']
14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1051.eqiad.wmnet']
14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1050.eqiad.wmnet']
14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1054.eqiad.wmnet']
14:59 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1053.eqiad.wmnet']
14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1052.eqiad.wmnet']
14:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1049.eqiad.wmnet']
14:57 godog: add 30G to prometheus@services and 300G to prometheus@ops (codfw)
14:57 dancy@deploy1002: Installation of scap version "4.61.0" completed for 595 hosts
14:56 dancy@deploy1002: Installing scap version "4.61.0" for 595 hosts
14:55 dancy@deploy1002: Installing scap version "4.61.0" for 596 hosts
14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1051.eqiad.wmnet']
14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1050.eqiad.wmnet']
14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1052.eqiad.wmnet']
14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1053.eqiad.wmnet']
14:50 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1054.eqiad.wmnet']
14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1044.eqiad.wmnet']
14:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1049.eqiad.wmnet']
14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1046.eqiad.wmnet']
14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1047.eqiad.wmnet']
14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1043.eqiad.wmnet']
14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1048.eqiad.wmnet']
14:49 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1045.eqiad.wmnet']
14:46 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host scandium.eqiad.wmnet
14:42 moritzm: installing Linux 6.1.52 on Bookworm hosts
14:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1048.eqiad.wmnet']
14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1047.eqiad.wmnet']
14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1046.eqiad.wmnet']
14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1045.eqiad.wmnet']
14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1044.eqiad.wmnet']
14:40 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1043.eqiad.wmnet']
14:39 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host scandium.eqiad.wmnet
14:39 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1042.eqiad.wmnet']
14:39 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1041.eqiad.wmnet']
14:38 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: furud.codfw.wmnet
14:38 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: furud.codfw.wmnet
14:36 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1040.eqiad.wmnet']
14:35 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1039.eqiad.wmnet']
14:34 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1038.eqiad.wmnet']
14:33 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1037.eqiad.wmnet']
14:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1042.eqiad.wmnet']
14:31 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1041.eqiad.wmnet']
14:30 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
14:30 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
14:30 moritzm: installing libssh2 security updates#
14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1040.eqiad.wmnet']
14:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1035.eqiad.wmnet']
14:27 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1039.eqiad.wmnet']
14:27 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1034.eqiad.wmnet']
14:25 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1038.eqiad.wmnet']
14:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1037.eqiad.wmnet']
14:24 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
14:24 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1033.eqiad.wmnet']
14:24 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
14:23 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1032.eqiad.wmnet']
14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1036.eqiad.wmnet']
14:22 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
14:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1031.eqiad.wmnet']
14:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
14:19 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1035.eqiad.wmnet']
14:18 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1034.eqiad.wmnet']
14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1028.eqiad.wmnet']
14:18 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1030.eqiad.wmnet']
14:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1033.eqiad.wmnet']
14:15 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubernetes1027.eqiad.wmnet']
14:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1032.eqiad.wmnet']
14:11 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1031.eqiad.wmnet']
14:10 sukhe: enable puppet on dns-rec to progessively roll out nsa->ns2 updates
14:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1030.eqiad.wmnet']
14:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
14:10 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1029.eqiad.wmnet']
14:09 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1028.eqiad.wmnet']
14:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes1027.eqiad.wmnet']
14:02 sukhe: [correction] enable puppet on dns6001 to test nsa removal
14:02 sukhe: enable puppet on doh6001 to test nsa removal
14:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
14:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
13:57 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:56 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:50 sukhe: disable puppet on A:dns-rec
13:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
13:46 TheresNoTime: UTC afternoon backport window closed
13:45 samtar@deploy1002: Finished scap: Backport for Reduce initial payload of Phonos styles (T345414) (duration: 08m 59s)
13:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343198)', diff saved to https://phabricator.wikimedia.org/P52477 and previous config saved to /var/cache/conftool/dbconfig/20230912-134451-arnaudb.json
13:40 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
13:39 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
13:39 samtar@deploy1002: samtar: Continuing with sync
13:38 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
13:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
13:38 samtar@deploy1002: samtar: Backport for Reduce initial payload of Phonos styles (T345414) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:36 samtar@deploy1002: Started scap: Backport for Reduce initial payload of Phonos styles (T345414)
13:36 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
13:35 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
13:31 taavi@deploy1002: Finished scap: Backport for Enable Parsoid support for Kartographer on enwiki (T342871) (duration: 26m 05s)
13:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P52476 and previous config saved to /var/cache/conftool/dbconfig/20230912-132944-arnaudb.json
13:19 taavi@deploy1002: ihurbain and taavi: Continuing with sync
13:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P52475 and previous config saved to /var/cache/conftool/dbconfig/20230912-131438-arnaudb.json
13:10 moritzm: installing grub2 updates from Bullseye point release
13:06 taavi@deploy1002: ihurbain and taavi: Backport for Enable Parsoid support for Kartographer on enwiki (T342871) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:05 taavi@deploy1002: Started scap: Backport for Enable Parsoid support for Kartographer on enwiki (T342871)
12:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T343198)', diff saved to https://phabricator.wikimedia.org/P52474 and previous config saved to /var/cache/conftool/dbconfig/20230912-125932-arnaudb.json
12:40 brouberol@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
12:24 brouberol@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
12:15 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
12:15 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
12:15 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
12:14 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
12:12 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudservices1004.wikimedia.org
12:11 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:11 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
12:09 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1004.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
12:07 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:59 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudservices1004.wikimedia.org
11:57 godog: pool thanos[12]001 for thanos.w.o - T341999
11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52473 and previous config saved to /var/cache/conftool/dbconfig/20230912-114711-root.json
11:43 godog: pool titan hosts alongside thanos-fe for thanos-query / thanos-web services - T341999
11:42 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1001.eqiad.wmnet,service=thanos-web
11:42 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1002.eqiad.wmnet,service=thanos-web
11:41 brouberol@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 7 hosts with reason: Mute initial failures of hadoop-hdfs-datanode.service
11:41 brouberol@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on 7 hosts with reason: Mute initial failures of hadoop-hdfs-datanode.service
11:40 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1002.eqiad.wmnet,service=thanos-web
11:40 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=titan1002.eqiad.wmnet,service=thanos-web
11:39 filippo@cumin1001: conftool action : set/pooled=no; selector: name=titan1001.eqiad.wmnet,service=thanos-web
11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan2002.codfw.wmnet
11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan2001.codfw.wmnet
11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan1002.eqiad.wmnet
11:37 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan1001.eqiad.wmnet
11:36 filippo@cumin1001: conftool action : set/weight=100; selector: name=titan*
11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan2002.codfw.wmnet
11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan2001.codfw.wmnet
11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan1002.eqiad.wmnet
11:35 filippo@cumin1001: conftool action : set/weight=10; selector: name=titan1001.eqiad.wmnet
11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52472 and previous config saved to /var/cache/conftool/dbconfig/20230912-113207-root.json
11:18 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudservices1004.wikimedia.org
11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52471 and previous config saved to /var/cache/conftool/dbconfig/20230912-111702-root.json
11:03 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
11:03 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
11:03 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
11:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52470 and previous config saved to /var/cache/conftool/dbconfig/20230912-110157-root.json
10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
10:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52468 and previous config saved to /var/cache/conftool/dbconfig/20230912-104652-root.json
10:45 moritzm: rebalance Ganeti cluster in eqiad/C following node reboots
10:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
10:37 taavi@cumin1001: conftool action : set/pooled=yes:weight=10; selector: cluster=cloudweb
10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
10:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52467 and previous config saved to /var/cache/conftool/dbconfig/20230912-103148-root.json
10:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
10:21 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
10:21 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 3%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52466 and previous config saved to /var/cache/conftool/dbconfig/20230912-101643-root.json
10:13 jmm@cumin2002: END (PASS) - Cookbook sre.pki.restart-reboot (exit_code=0) rolling reboot on A:pki
10:13 moritzm: disabled nginx/puppetdb/postgresql/microservice on puppetdb1002/2002 to ensure nothing hits the old endpoints anymore
10:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
10:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on puppetdb2002.codfw.wmnet with reason: Disable puppetdb/postgres/nginx on old nodes to ensure nothing hits them anyway
10:09 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
10:08 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
10:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres on old nodes to ensure nothing hits them anyway
10:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on puppetdb1002.eqiad.wmnet with reason: Disable puppetdb/postgres on old nodes to ensure nothing hits them anyway
10:02 hnowlan: enabling puppet on A:cp
10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52465 and previous config saved to /var/cache/conftool/dbconfig/20230912-100138-root.json
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
09:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
09:53 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
09:52 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
09:52 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
09:32 hnowlan: disabled puppet on A:cp
09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T343198)', diff saved to https://phabricator.wikimedia.org/P52464 and previous config saved to /var/cache/conftool/dbconfig/20230912-092639-arnaudb.json
09:26 jmm@cumin2002: END (FAIL) - Cookbook sre.pki.restart-reboot (exit_code=99) rolling reboot on A:pki
09:26 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:26 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52463 and previous config saved to /var/cache/conftool/dbconfig/20230912-092618-arnaudb.json
09:26 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
09:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P52461 and previous config saved to /var/cache/conftool/dbconfig/20230912-091112-arnaudb.json
08:58 claime: Running puppet on cp-text P:trafficserver::backend - T341780
08:58 claime: Sending 5% of global traffic to mw-on-k8s - T341780
08:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P52460 and previous config saved to /var/cache/conftool/dbconfig/20230912-085606-arnaudb.json
08:51 claime: mw-api-ext, mw-web: Raise total replicas to 14 - T341780
08:51 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
08:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
08:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
08:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
08:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
08:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
08:50 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
08:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
08:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
08:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
08:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1028.eqiad.wmnet
08:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52459 and previous config saved to /var/cache/conftool/dbconfig/20230912-084059-arnaudb.json
08:39 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.26 refs T343728
08:38 moritzm: rebalance Ganeti cluster in codfw/C following node replacement
08:24 oblivian@deploy1002: Finished scap: Backport for Replace calls to wfHostname with clusterconfig ones (duration: 09m 16s)
08:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
08:18 oblivian@deploy1002: oblivian: Continuing with sync
08:17 oblivian@deploy1002: oblivian: Backport for Replace calls to wfHostname with clusterconfig ones synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
08:15 oblivian@deploy1002: Started scap: Backport for Replace calls to wfHostname with clusterconfig ones
08:13 oblivian@deploy1002: Finished scap: Backport for ClusterConfig: also allow to return hostname, Enable PageNotice on enwiktionary beta (T61245) (duration: 45m 23s)
08:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
08:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
08:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
07:58 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1156.eqiad.wmnet
07:58 oblivian@deploy1002: tto and oblivian: Continuing with sync
07:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
07:56 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1156.eqiad.wmnet
07:56 oblivian@deploy1002: tto and oblivian: Backport for ClusterConfig: also allow to return hostname, Enable PageNotice on enwiktionary beta (T61245) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:53 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1155.eqiad.wmnet
07:51 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1155.eqiad.wmnet
07:51 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1154.eqiad.wmnet
07:49 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1154.eqiad.wmnet
07:45 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1153.eqiad.wmnet
07:43 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1153.eqiad.wmnet
07:36 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netmon1003.wikimedia.org
07:28 oblivian@deploy1002: Started scap: Backport for ClusterConfig: also allow to return hostname, Enable PageNotice on enwiktionary beta (T61245)
07:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
07:23 oblivian@deploy1002: Finished scap: Backport for update noc README, Use ClusterConfig (duration: 13m 46s)
07:17 oblivian@deploy1002: oblivian: Continuing with sync
07:11 oblivian@deploy1002: oblivian: Backport for update noc README, Use ClusterConfig synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:09 oblivian@deploy1002: Started scap: Backport for update noc README, Use ClusterConfig
07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
06:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T343198)', diff saved to https://phabricator.wikimedia.org/P52456 and previous config saved to /var/cache/conftool/dbconfig/20230912-062353-arnaudb.json
06:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
06:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
06:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343198)', diff saved to https://phabricator.wikimedia.org/P52455 and previous config saved to /var/cache/conftool/dbconfig/20230912-062332-arnaudb.json
06:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P52454 and previous config saved to /var/cache/conftool/dbconfig/20230912-060825-arnaudb.json
05:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P52453 and previous config saved to /var/cache/conftool/dbconfig/20230912-055319-arnaudb.json
05:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2014.codfw.wmnet with OS bullseye
05:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T343198)', diff saved to https://phabricator.wikimedia.org/P52452 and previous config saved to /var/cache/conftool/dbconfig/20230912-053813-arnaudb.json
05:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2014.codfw.wmnet with reason: host reimage
05:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2014.codfw.wmnet with reason: host reimage
05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1119 with Debian Bookworm in s1 with just 10% T339185', diff saved to https://phabricator.wikimedia.org/P52450 and previous config saved to /var/cache/conftool/dbconfig/20230912-051753-marostegui.json
05:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2158', diff saved to https://phabricator.wikimedia.org/P52449 and previous config saved to /var/cache/conftool/dbconfig/20230912-051725-root.json
05:11 moritzm: installing aom security updates
05:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2014.codfw.wmnet with OS bullseye
05:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
05:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T343198)', diff saved to https://phabricator.wikimedia.org/P52448 and previous config saved to /var/cache/conftool/dbconfig/20230912-050033-arnaudb.json
05:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:00 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
05:00 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
04:59 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
04:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52447 and previous config saved to /var/cache/conftool/dbconfig/20230912-045944-arnaudb.json
04:56 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
04:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P52446 and previous config saved to /var/cache/conftool/dbconfig/20230912-044437-arnaudb.json
04:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P52445 and previous config saved to /var/cache/conftool/dbconfig/20230912-042931-arnaudb.json
04:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52444 and previous config saved to /var/cache/conftool/dbconfig/20230912-041425-arnaudb.json
03:58 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.23, 1.41.0-wmf.24 (duration: 02m 30s)
03:56 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.26 refs T343728 (duration: 53m 18s)
03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.26 refs T343728
02:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1002.eqiad.wmnet with OS bookworm
02:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:48 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
02:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1002.eqiad.wmnet with reason: host reimage
02:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
01:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host titan1001.eqiad.wmnet with OS bookworm
01:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
01:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on titan1001.eqiad.wmnet with reason: host reimage
01:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
00:17 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T343198)', diff saved to https://phabricator.wikimedia.org/P52443 and previous config saved to /var/cache/conftool/dbconfig/20230912-001715-arnaudb.json
00:17 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
00:16 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
00:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52442 and previous config saved to /var/cache/conftool/dbconfig/20230912-001654-arnaudb.json
00:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P52441 and previous config saved to /var/cache/conftool/dbconfig/20230912-000148-arnaudb.json

2023-09-11

23:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P52440 and previous config saved to /var/cache/conftool/dbconfig/20230911-234641-arnaudb.json
23:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52439 and previous config saved to /var/cache/conftool/dbconfig/20230911-233135-arnaudb.json
23:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T343198)', diff saved to https://phabricator.wikimedia.org/P52438 and previous config saved to /var/cache/conftool/dbconfig/20230911-231131-arnaudb.json
23:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
23:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
23:11 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
23:11 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
23:10 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343198)', diff saved to https://phabricator.wikimedia.org/P52437 and previous config saved to /var/cache/conftool/dbconfig/20230911-231054-arnaudb.json
22:55 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P52436 and previous config saved to /var/cache/conftool/dbconfig/20230911-225548-arnaudb.json
22:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1002.eqiad.wmnet with OS bookworm
22:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host titan1001.eqiad.wmnet with OS bookworm
22:40 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P52435 and previous config saved to /var/cache/conftool/dbconfig/20230911-224042-arnaudb.json
22:25 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T343198)', diff saved to https://phabricator.wikimedia.org/P52434 and previous config saved to /var/cache/conftool/dbconfig/20230911-222536-arnaudb.json
21:33 cwhite: update grafana to 9.4.14 on grafana1002 T345362
21:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host titan1002.eqiad.wmnet with OS bookworm
21:32 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host titan1001.eqiad.wmnet with OS bookworm
21:19 sbassett: Deployed security fix for T345693
20:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan1002.mgmt.eqiad.wmnet with reboot policy FORCED
20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['titan1001.eqiad.wmnet']
20:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan1001.eqiad.wmnet']
20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['titan1001.eqiad.wmnet']
20:41 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan1001.eqiad.wmnet']
20:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['titan1001.eqiad.wmnet']
20:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['titan1001.eqiad.wmnet']
20:39 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host titan1001.mgmt.eqiad.wmnet with reboot policy FORCED
20:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host titan1002.mgmt.eqiad.wmnet with reboot policy FORCED
20:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host titan1001.mgmt.eqiad.wmnet with reboot policy FORCED
20:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host titan1001
20:17 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host titan1001
20:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host titan1002
20:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
20:13 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host titan1002
20:13 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host titan1001
20:13 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host titan1001
20:13 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:13 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
20:12 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
20:10 jclark@cumin1001: START - Cookbook sre.dns.netbox
20:10 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:09 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
20:09 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt titan1001 - jclark@cumin1001"
20:05 jclark@cumin1001: START - Cookbook sre.dns.netbox
19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T337310)', diff saved to https://phabricator.wikimedia.org/P52432 and previous config saved to /var/cache/conftool/dbconfig/20230911-194332-ladsgroup.json
19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P52431 and previous config saved to /var/cache/conftool/dbconfig/20230911-192826-ladsgroup.json
19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P52430 and previous config saved to /var/cache/conftool/dbconfig/20230911-191320-ladsgroup.json
18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T337310)', diff saved to https://phabricator.wikimedia.org/P52429 and previous config saved to /var/cache/conftool/dbconfig/20230911-185813-ladsgroup.json
18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T337310)', diff saved to https://phabricator.wikimedia.org/P52428 and previous config saved to /var/cache/conftool/dbconfig/20230911-184231-ladsgroup.json
18:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
18:33 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netmon2002.wikimedia.org with OS bullseye
18:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
18:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
18:11 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
18:08 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
18:00 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1030.eqiad.wmnet with OS bullseye
17:59 eevans@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
17:58 eevans@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1001"
17:53 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bullseye
17:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
17:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T337310)', diff saved to https://phabricator.wikimedia.org/P52427 and previous config saved to /var/cache/conftool/dbconfig/20230911-174321-ladsgroup.json
17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P52426 and previous config saved to /var/cache/conftool/dbconfig/20230911-172815-ladsgroup.json
17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096', diff saved to https://phabricator.wikimedia.org/P52425 and previous config saved to /var/cache/conftool/dbconfig/20230911-171309-ladsgroup.json
17:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1030.eqiad.wmnet with reason: host reimage
17:06 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1030.eqiad.wmnet with reason: host reimage
16:59 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
16:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2096 (T337310)', diff saved to https://phabricator.wikimedia.org/P52424 and previous config saved to /var/cache/conftool/dbconfig/20230911-165802-ladsgroup.json
16:57 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
16:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1055.mgmt.eqiad.wmnet with reboot policy FORCED
16:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1056.mgmt.eqiad.wmnet with reboot policy FORCED
16:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1054.mgmt.eqiad.wmnet with reboot policy FORCED
16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2096 (T337310)', diff saved to https://phabricator.wikimedia.org/P52423 and previous config saved to /var/cache/conftool/dbconfig/20230911-164249-ladsgroup.json
16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2096.codfw.wmnet with reason: Maintenance
16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2096.codfw.wmnet with reason: Maintenance
16:41 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
16:32 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
16:31 denisse@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netmon2002.wikimedia.org with OS bookworm
16:28 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2005-dev.codfw.wmnet with reason: host reimage
16:18 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1056.mgmt.eqiad.wmnet with reboot policy FORCED
16:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1054.mgmt.eqiad.wmnet with reboot policy FORCED
16:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1055.mgmt.eqiad.wmnet with reboot policy FORCED
16:16 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
16:12 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1152.eqiad.wmnet
16:10 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1152.eqiad.wmnet
16:10 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
16:08 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2005-dev.codfw.wmnet with OS bookworm
16:07 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1151.eqiad.wmnet
16:06 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1047.mgmt.eqiad.wmnet with reboot policy FORCED
16:06 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
16:05 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1151.eqiad.wmnet
16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1050.mgmt.eqiad.wmnet with reboot policy FORCED
16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1052.mgmt.eqiad.wmnet with reboot policy FORCED
16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1051.mgmt.eqiad.wmnet with reboot policy FORCED
16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1049.mgmt.eqiad.wmnet with reboot policy FORCED
16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1046.mgmt.eqiad.wmnet with reboot policy FORCED
16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1053.mgmt.eqiad.wmnet with reboot policy FORCED
16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1048.mgmt.eqiad.wmnet with reboot policy FORCED
16:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1045.mgmt.eqiad.wmnet with reboot policy FORCED
16:03 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on netmon2002.wikimedia.org with reason: host reimage
16:01 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1150.eqiad.wmnet
16:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
16:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
15:59 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1150.eqiad.wmnet
15:48 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1047.mgmt.eqiad.wmnet with reboot policy FORCED
15:48 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:48 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1047 - jclark@cumin1001"
15:47 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1047 - jclark@cumin1001"
15:45 jclark@cumin1001: START - Cookbook sre.dns.netbox
15:44 brouberol@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1149.eqiad.wmnet
15:43 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host netmon2002.wikimedia.org with OS bookworm
15:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T343198)', diff saved to https://phabricator.wikimedia.org/P52421 and previous config saved to /var/cache/conftool/dbconfig/20230911-154327-arnaudb.json
15:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
15:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
15:41 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1149.eqiad.wmnet
15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1048.mgmt.eqiad.wmnet with reboot policy FORCED
15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1046.mgmt.eqiad.wmnet with reboot policy FORCED
15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1045.mgmt.eqiad.wmnet with reboot policy FORCED
15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1050.mgmt.eqiad.wmnet with reboot policy FORCED
15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1049.mgmt.eqiad.wmnet with reboot policy FORCED
15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1053.mgmt.eqiad.wmnet with reboot policy FORCED
15:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1051.mgmt.eqiad.wmnet with reboot policy FORCED
15:40 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1052.mgmt.eqiad.wmnet with reboot policy FORCED
15:36 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1040.mgmt.eqiad.wmnet with reboot policy FORCED
15:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1043.mgmt.eqiad.wmnet with reboot policy FORCED
15:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1037.mgmt.eqiad.wmnet with reboot policy FORCED
15:27 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1044.mgmt.eqiad.wmnet with reboot policy FORCED
15:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1039.mgmt.eqiad.wmnet with reboot policy FORCED
15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1041.mgmt.eqiad.wmnet with reboot policy FORCED
15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1042.mgmt.eqiad.wmnet with reboot policy FORCED
15:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1038.mgmt.eqiad.wmnet with reboot policy FORCED
15:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T337310)', diff saved to https://phabricator.wikimedia.org/P52420 and previous config saved to /var/cache/conftool/dbconfig/20230911-152456-ladsgroup.json
15:23 fnegri@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet2005-dev.codfw.wmnet with OS bookworm
15:21 jnuche@deploy1002: Installation of scap version "4.60.0" completed for 595 hosts
15:20 jnuche@deploy1002: Installing scap version "4.60.0" for 595 hosts
15:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
15:18 jnuche@deploy1002: Installing scap version "4.60.0" for 595 hosts
15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P52419 and previous config saved to /var/cache/conftool/dbconfig/20230911-150950-ladsgroup.json
15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1044.mgmt.eqiad.wmnet with reboot policy FORCED
15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1043.mgmt.eqiad.wmnet with reboot policy FORCED
15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1042.mgmt.eqiad.wmnet with reboot policy FORCED
15:08 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1041.mgmt.eqiad.wmnet with reboot policy FORCED
15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1040.mgmt.eqiad.wmnet with reboot policy FORCED
15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1039.mgmt.eqiad.wmnet with reboot policy FORCED
15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1038.mgmt.eqiad.wmnet with reboot policy FORCED
15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
15:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1037.mgmt.eqiad.wmnet with reboot policy FORCED
15:05 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
15:04 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:03 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:02 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1031.mgmt.eqiad.wmnet with reboot policy FORCED
14:56 brouberol@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1149.eqiad.wmnet
14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220', diff saved to https://phabricator.wikimedia.org/P52418 and previous config saved to /var/cache/conftool/dbconfig/20230911-145443-ladsgroup.json
14:54 brouberol@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1149.eqiad.wmnet
14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1220 (T337310)', diff saved to https://phabricator.wikimedia.org/P52417 and previous config saved to /var/cache/conftool/dbconfig/20230911-143937-ladsgroup.json
14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1220 (T337310)', diff saved to https://phabricator.wikimedia.org/P52416 and previous config saved to /var/cache/conftool/dbconfig/20230911-143102-ladsgroup.json
14:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1220.eqiad.wmnet with reason: Maintenance
14:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1220.eqiad.wmnet with reason: Maintenance
14:19 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet2005-dev.codfw.wmnet with OS bookworm
13:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2002.codfw.wmnet
13:55 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf2002.codfw.wmnet
13:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1002.eqiad.wmnet
13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
13:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T337310)', diff saved to https://phabricator.wikimedia.org/P52414 and previous config saved to /var/cache/conftool/dbconfig/20230911-135520-ladsgroup.json
13:49 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf1002.eqiad.wmnet
13:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1001.eqiad.wmnet
13:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf1001.eqiad.wmnet
13:43 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2045.codfw.wmnet
13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P52413 and previous config saved to /var/cache/conftool/dbconfig/20230911-134013-ladsgroup.json
13:40 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service in more wikis - rollout #3 (T341445) (duration: 11m 18s)
13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2045.codfw.wmnet
13:33 kartik@deploy1002: kartik and abi: Continuing with sync
13:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2001.codfw.wmnet
13:30 kartik@deploy1002: kartik and abi: Backport for Enable MinT translation service in more wikis - rollout #3 (T341445) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:28 kartik@deploy1002: Started scap: Backport for Enable MinT translation service in more wikis - rollout #3 (T341445)
13:26 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace"" (duration: 08m 04s)
13:26 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-wf2001.codfw.wmnet
13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P52412 and previous config saved to /var/cache/conftool/dbconfig/20230911-132507-ladsgroup.json
13:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119', diff saved to https://phabricator.wikimedia.org/P52411 and previous config saved to /var/cache/conftool/dbconfig/20230911-132210-root.json
13:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas3001.wikimedia.org
13:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas3001.wikimedia.org - ayounsi@cumin1001"
13:20 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Continuing with sync
13:19 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Revert "Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace"" synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:19 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas3001.wikimedia.org - ayounsi@cumin1001"
13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas3001.wikimedia.org on all recursors
13:19 ayounsi@cumin1001: START - Cookbook sre.dns.wipe-cache atlas3001.wikimedia.org on all recursors
13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1001"
13:18 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas3001.wikimedia.org - ayounsi@cumin1001"
13:18 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace""
13:16 lucaswerkmeister-wmde@deploy1002: Sync cancelled.
13:16 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
13:16 ayounsi@cumin1001: START - Cookbook sre.ganeti.makevm for new host atlas3001.wikimedia.org
13:11 lucaswerkmeister-wmde@deploy1002: func and lucaswerkmeister-wmde: Backport for Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace" (T340697) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T337310)', diff saved to https://phabricator.wikimedia.org/P52409 and previous config saved to /var/cache/conftool/dbconfig/20230911-131001-ladsgroup.json
13:09 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Reapply "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace" (T340697)
13:06 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1037.eqiad.wmnet
13:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2055.codfw.wmnet
12:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1037.eqiad.wmnet
12:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2055.codfw.wmnet
12:38 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:37 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
12:37 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
12:30 aborrero@cumin1001: START - Cookbook sre.dns.netbox
12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T337310)', diff saved to https://phabricator.wikimedia.org/P52408 and previous config saved to /var/cache/conftool/dbconfig/20230911-122535-ladsgroup.json
12:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1137.eqiad.wmnet with reason: Maintenance
12:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1137.eqiad.wmnet with reason: Maintenance
12:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
12:21 moritzm: restarting apache/FPM on mediawiki canaries
12:18 moritzm: installing libssh2 security updates
12:18 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2054.codfw.wmnet
12:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1054.eqiad.wmnet
12:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
12:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2054.codfw.wmnet
12:11 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1054.eqiad.wmnet
11:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti2014.codfw.wmnet to cluster codfw and group C
11:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti2014.codfw.wmnet to cluster codfw and group C
11:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2003.codfw.wmnet
11:42 Amir1: setting binlog format to STATEMENT in x1 eqiad and codfw masters (T337310)
11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
11:42 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2003.codfw.wmnet
11:41 claime: Rebooting poolcounter2003.codfw.wmnet
11:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
11:32 isaranto@deploy1002: Finished scap: Backport for ores-extension: enable lw in enwiki and wikidata (T342115) (duration: 23m 46s)
11:30 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2004.codfw.wmnet
11:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host poolcounter2004.codfw.wmnet
11:26 isaranto@deploy1002: isaranto: Continuing with sync
11:26 claime: Rebooting poolcounter2004.codfw.wmnet
11:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
11:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
11:10 isaranto@deploy1002: isaranto: Backport for ores-extension: enable lw in enwiki and wikidata (T342115) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
11:09 isaranto@deploy1002: Started scap: Backport for ores-extension: enable lw in enwiki and wikidata (T342115)
11:06 volans: installed spicearck v7.2.2 on both cumin hosts
10:59 volans: uploaded spicerack_7.2.2 to apt.wikimedia.org bullseye-wikimedia
10:39 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1003.wikimedia.org with OS bullseye
10:27 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
10:16 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
10:14 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:03 jelto@cumin1001: START - Cookbook sre.hosts.reimage for host gitlab1003.wikimedia.org with OS bullseye
09:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
09:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
09:53 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
09:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
09:32 claime: rearmed keyholder on deploy2002.codfw.wmnet
09:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 100%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52405 and previous config saved to /var/cache/conftool/dbconfig/20230911-092650-root.json
09:26 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
09:25 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:24 gehel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: T342361 - testing blazegraph startup script refactor
09:24 gehel@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs1009.eqiad.wmnet with reason: T342361 - testing blazegraph startup script refactor
09:18 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
09:18 claime: rebooting deploy2002.codfw.wmnet
09:18 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
09:18 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
09:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52404 and previous config saved to /var/cache/conftool/dbconfig/20230911-091817-arnaudb.json
09:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 75%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52403 and previous config saved to /var/cache/conftool/dbconfig/20230911-091145-root.json
09:08 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P52402 and previous config saved to /var/cache/conftool/dbconfig/20230911-090310-arnaudb.json
08:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 50%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52401 and previous config saved to /var/cache/conftool/dbconfig/20230911-085640-root.json
08:52 urbanecm@deploy1002: Finished scap: Backport for Revert "Growth: Disable Add an image on all wikis" (T345188) (duration: 10m 27s)
08:51 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52400 and previous config saved to /var/cache/conftool/dbconfig/20230911-085129-arnaudb.json
08:51 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
08:51 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1127.eqiad.wmnet with reason: Maintenance
08:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P52399 and previous config saved to /var/cache/conftool/dbconfig/20230911-084804-arnaudb.json
08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 100%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52398 and previous config saved to /var/cache/conftool/dbconfig/20230911-084647-root.json
08:46 urbanecm@deploy1002: urbanecm: Continuing with sync
08:45 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwmaint2002.codfw.wmnet
08:44 urbanecm@deploy1002: urbanecm: Backport for Revert "Growth: Disable Add an image on all wikis" (T345188) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
08:42 urbanecm@deploy1002: Started scap: Backport for Revert "Growth: Disable Add an image on all wikis" (T345188)
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 25%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52397 and previous config saved to /var/cache/conftool/dbconfig/20230911-084135-root.json
08:37 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwmaint2002.codfw.wmnet
08:37 claime: rebooting mwmaint2002.codfw.wmnet
08:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1001.eqiad.wmnet
08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Pool db1119 with Debian Bookworm in s1 with just 1% T339185', diff saved to https://phabricator.wikimedia.org/P52396 and previous config saved to /var/cache/conftool/dbconfig/20230911-083346-marostegui.json
08:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52395 and previous config saved to /var/cache/conftool/dbconfig/20230911-083258-arnaudb.json
08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 75%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52394 and previous config saved to /var/cache/conftool/dbconfig/20230911-083143-root.json
08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 10%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52393 and previous config saved to /var/cache/conftool/dbconfig/20230911-082631-root.json
08:26 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwdebug1001.eqiad.wmnet
08:26 claime: rebooting mwdebug1001.eqiad.wmnet
08:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug1002.eqiad.wmnet
08:20 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwdebug1002.eqiad.wmnet
08:20 claime: rebooting mwdebug1002.eqiad.wmnet
08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 50%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52392 and previous config saved to /var/cache/conftool/dbconfig/20230911-081638-root.json
08:13 kostajh: UTC morning deploys done
08:13 kharlan@deploy1002: Finished scap: Backport for [beta] ReportIncident: Enable on kowiki beta (T339275), [beta] Enable ReportIncident for configured beta wikis (T339275), ReportIncident: Set default help page (T343382) (duration: 09m 44s)
08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 5%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52391 and previous config saved to /var/cache/conftool/dbconfig/20230911-081126-root.json
08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
08:07 kharlan@deploy1002: kharlan: Continuing with sync
08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
08:05 kharlan@deploy1002: kharlan: Backport for [beta] ReportIncident: Enable on kowiki beta (T339275), [beta] Enable ReportIncident for configured beta wikis (T339275), ReportIncident: Set default help page (T343382) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deplo
08:03 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
08:03 kharlan@deploy1002: Started scap: Backport for [beta] ReportIncident: Enable on kowiki beta (T339275), [beta] Enable ReportIncident for configured beta wikis (T339275), ReportIncident: Set default help page (T343382)
08:03 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
08:02 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
08:02 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
08:02 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
08:01 filippo@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 25%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52390 and previous config saved to /var/cache/conftool/dbconfig/20230911-080133-root.json
08:00 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
08:00 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
08:00 kharlan@deploy1002: Finished scap: Backport for ReportIncident: Default deployment to false (T339275) (duration: 11m 15s)
08:00 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
08:00 filippo@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
07:59 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
07:59 filippo@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 3%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52389 and previous config saved to /var/cache/conftool/dbconfig/20230911-075621-root.json
07:53 kharlan@deploy1002: kharlan: Continuing with sync
07:50 kharlan@deploy1002: kharlan: Backport for ReportIncident: Default deployment to false (T339275) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:49 kharlan@deploy1002: Started scap: Backport for ReportIncident: Default deployment to false (T339275)
07:46 kharlan@deploy1002: Finished scap: Backport for Add ReportIncident extension (T339275) (duration: 22m 44s)
07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 10%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52388 and previous config saved to /var/cache/conftool/dbconfig/20230911-074629-root.json
07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1134 (re)pooling @ 1%: Repooling after cloning another host', diff saved to https://phabricator.wikimedia.org/P52387 and previous config saved to /var/cache/conftool/dbconfig/20230911-074116-root.json
07:36 kharlan@deploy1002: kharlan: Continuing with sync
07:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
07:35 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
07:33 kharlan@deploy1002: kharlan: Backport for Add ReportIncident extension (T339275) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 5%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52386 and previous config saved to /var/cache/conftool/dbconfig/20230911-073124-root.json
07:23 kharlan@deploy1002: Started scap: Backport for Add ReportIncident extension (T339275)
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 3%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52385 and previous config saved to /var/cache/conftool/dbconfig/20230911-071619-root.json
07:11 kharlan@deploy1002: Started scap: Backport for Add ReportIncident extension (T339275)
07:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1128 (re)pooling @ 1%: Repooling after being recloned T345509', diff saved to https://phabricator.wikimedia.org/P52384 and previous config saved to /var/cache/conftool/dbconfig/20230911-070114-root.json
06:57 marostegui@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
06:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136065
06:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136065
05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1119 back to s1 depooled T339185', diff saved to https://phabricator.wikimedia.org/P52383 and previous config saved to /var/cache/conftool/dbconfig/20230911-054057-marostegui.json
05:00 marostegui@cumin1001: START - Cookbook sre.mysql.clone of db1134.eqiad.wmnet onto db1128.eqiad.wmnet
04:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1134', diff saved to https://phabricator.wikimedia.org/P52382 and previous config saved to /var/cache/conftool/dbconfig/20230911-045907-root.json
01:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T343198)', diff saved to https://phabricator.wikimedia.org/P52381 and previous config saved to /var/cache/conftool/dbconfig/20230911-012911-arnaudb.json
01:29 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
01:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
01:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343198)', diff saved to https://phabricator.wikimedia.org/P52380 and previous config saved to /var/cache/conftool/dbconfig/20230911-012850-arnaudb.json
01:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P52379 and previous config saved to /var/cache/conftool/dbconfig/20230911-011343-arnaudb.json
00:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P52378 and previous config saved to /var/cache/conftool/dbconfig/20230911-005837-arnaudb.json
00:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T343198)', diff saved to https://phabricator.wikimedia.org/P52377 and previous config saved to /var/cache/conftool/dbconfig/20230911-004331-arnaudb.json

2023-09-10

17:35 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T343198)', diff saved to https://phabricator.wikimedia.org/P52375 and previous config saved to /var/cache/conftool/dbconfig/20230910-173502-arnaudb.json
17:34 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
17:34 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
11:19 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
11:19 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
11:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343198)', diff saved to https://phabricator.wikimedia.org/P52374 and previous config saved to /var/cache/conftool/dbconfig/20230910-111941-arnaudb.json
11:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P52373 and previous config saved to /var/cache/conftool/dbconfig/20230910-110435-arnaudb.json
10:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P52372 and previous config saved to /var/cache/conftool/dbconfig/20230910-104929-arnaudb.json
10:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T343198)', diff saved to https://phabricator.wikimedia.org/P52371 and previous config saved to /var/cache/conftool/dbconfig/20230910-103422-arnaudb.json
04:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T343198)', diff saved to https://phabricator.wikimedia.org/P52370 and previous config saved to /var/cache/conftool/dbconfig/20230910-042338-arnaudb.json
04:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
04:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
04:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343198)', diff saved to https://phabricator.wikimedia.org/P52369 and previous config saved to /var/cache/conftool/dbconfig/20230910-042317-arnaudb.json
04:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P52368 and previous config saved to /var/cache/conftool/dbconfig/20230910-040811-arnaudb.json
03:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P52367 and previous config saved to /var/cache/conftool/dbconfig/20230910-035304-arnaudb.json
03:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T343198)', diff saved to https://phabricator.wikimedia.org/P52366 and previous config saved to /var/cache/conftool/dbconfig/20230910-033758-arnaudb.json
01:38 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T343198)', diff saved to https://phabricator.wikimedia.org/P52365 and previous config saved to /var/cache/conftool/dbconfig/20230910-013823-arnaudb.json
01:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
01:38 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
01:38 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
01:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
01:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343198)', diff saved to https://phabricator.wikimedia.org/P52364 and previous config saved to /var/cache/conftool/dbconfig/20230910-013745-arnaudb.json
01:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P52363 and previous config saved to /var/cache/conftool/dbconfig/20230910-012239-arnaudb.json
01:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P52362 and previous config saved to /var/cache/conftool/dbconfig/20230910-010733-arnaudb.json
00:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T343198)', diff saved to https://phabricator.wikimedia.org/P52361 and previous config saved to /var/cache/conftool/dbconfig/20230910-005226-arnaudb.json

2023-09-09

20:16 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
19:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
19:35 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2005-dev.codfw.wmnet with reason: host reimage
19:14 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2005-dev.codfw.wmnet with OS bookworm
18:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T343198)', diff saved to https://phabricator.wikimedia.org/P52360 and previous config saved to /var/cache/conftool/dbconfig/20230909-182802-arnaudb.json
18:27 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
18:27 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
18:27 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343198)', diff saved to https://phabricator.wikimedia.org/P52359 and previous config saved to /var/cache/conftool/dbconfig/20230909-182741-arnaudb.json
18:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P52358 and previous config saved to /var/cache/conftool/dbconfig/20230909-181234-arnaudb.json
17:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P52357 and previous config saved to /var/cache/conftool/dbconfig/20230909-175728-arnaudb.json
17:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T343198)', diff saved to https://phabricator.wikimedia.org/P52356 and previous config saved to /var/cache/conftool/dbconfig/20230909-174222-arnaudb.json
17:35 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.codfw.wmnet with OS bookworm
16:54 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
16:51 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
16:33 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bookworm
16:27 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
15:44 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
15:41 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
15:22 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
11:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T343198)', diff saved to https://phabricator.wikimedia.org/P52355 and previous config saved to /var/cache/conftool/dbconfig/20230909-111508-arnaudb.json
11:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
11:14 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
11:14 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52354 and previous config saved to /var/cache/conftool/dbconfig/20230909-111447-arnaudb.json
10:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P52353 and previous config saved to /var/cache/conftool/dbconfig/20230909-105941-arnaudb.json
10:44 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P52352 and previous config saved to /var/cache/conftool/dbconfig/20230909-104434-arnaudb.json
10:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52351 and previous config saved to /var/cache/conftool/dbconfig/20230909-102928-arnaudb.json
04:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T343198)', diff saved to https://phabricator.wikimedia.org/P52350 and previous config saved to /var/cache/conftool/dbconfig/20230909-040947-arnaudb.json
04:09 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
04:09 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
04:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52349 and previous config saved to /var/cache/conftool/dbconfig/20230909-040925-arnaudb.json
03:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P52348 and previous config saved to /var/cache/conftool/dbconfig/20230909-035419-arnaudb.json
03:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P52347 and previous config saved to /var/cache/conftool/dbconfig/20230909-033913-arnaudb.json
03:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52346 and previous config saved to /var/cache/conftool/dbconfig/20230909-032407-arnaudb.json
02:19 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye
01:38 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
01:35 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
01:18 root@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bullseye

2023-09-08

21:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
21:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
21:35 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1035.mgmt.eqiad.wmnet with reboot policy FORCED
21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1036.mgmt.eqiad.wmnet with reboot policy FORCED
21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1034.mgmt.eqiad.wmnet with reboot policy FORCED
21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1033.mgmt.eqiad.wmnet with reboot policy FORCED
21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1032.mgmt.eqiad.wmnet with reboot policy FORCED
21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1030.mgmt.eqiad.wmnet with reboot policy FORCED
21:34 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1028.mgmt.eqiad.wmnet with reboot policy FORCED
21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1036.mgmt.eqiad.wmnet with reboot policy FORCED
21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1035.mgmt.eqiad.wmnet with reboot policy FORCED
21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1034.mgmt.eqiad.wmnet with reboot policy FORCED
21:15 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1033.mgmt.eqiad.wmnet with reboot policy FORCED
21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1032.mgmt.eqiad.wmnet with reboot policy FORCED
21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1031.mgmt.eqiad.wmnet with reboot policy FORCED
21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1030.mgmt.eqiad.wmnet with reboot policy FORCED
21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1029.mgmt.eqiad.wmnet with reboot policy FORCED
21:14 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1028.mgmt.eqiad.wmnet with reboot policy FORCED
21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1056
21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1055
21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1054
21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1056
21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1055
21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1054
21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1053
21:10 ejegg: civicrm upgraded from de883cd5 to 771fcde3
21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1052
21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1053
21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1052
21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1046
21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1051
21:10 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1050
21:10 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1050
21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1050
21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1051
21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1050
21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1049
21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1049
21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1048
21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1048
21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1047
21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1047
21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1048
21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1048
21:09 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1047
21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1047
21:09 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1046
21:08 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T343198)', diff saved to https://phabricator.wikimedia.org/P52345 and previous config saved to /var/cache/conftool/dbconfig/20230908-210844-arnaudb.json
21:08 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
21:08 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
21:08 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1044
21:08 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1045
21:07 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1041
21:07 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1045
21:07 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1044
21:06 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1043
21:06 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1042
21:06 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1041
21:05 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1043
21:04 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1042
21:04 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1038
21:04 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1040
21:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1040
21:03 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1039
21:03 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1038
21:02 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1039
21:02 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1037
21:02 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1037
21:02 jclark@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host kubernetes1039
21:01 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1038
21:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1037
21:00 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1039
21:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1035
21:00 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1038
21:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1036
20:59 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1037
20:59 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1036
20:59 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1035
20:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1032
20:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1033
20:59 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1034
20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1034
20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1033
20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1031
20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1030
20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1032
20:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1031
20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1028
20:58 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1029
20:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1028
20:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1030
20:57 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1029
20:53 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:53 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
20:52 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
20:49 jclark@cumin1001: START - Cookbook sre.dns.netbox
20:28 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:28 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
20:27 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes102 - jclark@cumin1001"
20:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
20:24 jclark@cumin1001: START - Cookbook sre.dns.netbox
17:24 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
17:20 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:19 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
17:13 taavi: reprepro copy bookworm-wikimedia bullseye-wikimedia prometheus-memcached-exporter # T345810
16:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:16 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:53 jclark@cumin1001: START - Cookbook sre.hosts.provision for host kubernetes1027.mgmt.eqiad.wmnet with reboot policy FORCED
15:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubernetes1027
15:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host kubernetes1027
15:45 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:45 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1027 - jclark@cumin1001"
15:44 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt kubernetes1027 - jclark@cumin1001"
15:44 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
15:42 jclark@cumin1001: START - Cookbook sre.dns.netbox
15:37 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
15:27 sukhe: running authdns-update for CR 955943
15:22 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['stat1011.eqiad.wmne']
15:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['moss-be1003.eqiad.wmnet']
15:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be1003.eqiad.wmnet']
15:15 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-be1003.mgmt.eqiad.wmnet with reboot policy FORCED
15:13 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:13 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1011.eqiad.wmne']
15:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host stat1011.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
14:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
14:44 jclark@cumin1001: START - Cookbook sre.hosts.provision for host stat1011.mgmt.eqiad.wmnet with reboot policy FORCED
14:43 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host stat1011
14:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T343198)', diff saved to https://phabricator.wikimedia.org/P52343 and previous config saved to /var/cache/conftool/dbconfig/20230908-144321-arnaudb.json
14:42 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host stat1011
14:42 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:42 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt stat1011 - jclark@cumin1001"
14:41 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt stat1011 - jclark@cumin1001"
14:39 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P52342 and previous config saved to /var/cache/conftool/dbconfig/20230908-142815-arnaudb.json
14:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host moss-be1003.mgmt.eqiad.wmnet with reboot policy FORCED
14:28 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host moss-be1003
14:27 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host moss-be1003
14:27 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:27 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt moss-be1003 - jclark@cumin1001"
14:26 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt moss-be1003 - jclark@cumin1001"
14:24 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P52341 and previous config saved to /var/cache/conftool/dbconfig/20230908-141309-arnaudb.json
13:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T343198)', diff saved to https://phabricator.wikimedia.org/P52340 and previous config saved to /var/cache/conftool/dbconfig/20230908-135803-arnaudb.json
13:39 isaranto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
13:39 isaranto@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
13:39 isaranto@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
13:38 isaranto@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
13:37 isaranto@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
13:37 isaranto@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
13:34 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:34 kevinbazira@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
13:24 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
13:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
13:05 isaranto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
13:05 isaranto@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
13:01 isaranto@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
13:01 isaranto@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
13:00 isaranto@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
12:59 isaranto@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1006.wikimedia.org
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-replica1006.wikimedia.org with OS bookworm
12:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
12:51 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
12:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
12:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
12:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
12:49 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
12:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica1006.wikimedia.org with reason: host reimage
12:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica1006.wikimedia.org with reason: host reimage
12:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1005.wikimedia.org
12:31 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1005.wikimedia.org
12:29 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica1006.wikimedia.org with OS bookworm
12:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
12:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
12:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1006.wikimedia.org on all recursors
12:23 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1006.wikimedia.org on all recursors
12:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
12:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
12:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:18 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1006.wikimedia.org
12:18 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica1006.wikimedia.org
12:17 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
12:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1006.wikimedia.org
12:05 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ldap-replica1006.wikimedia.org
12:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1006.wikimedia.org on all recursors
12:05 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1006.wikimedia.org on all recursors
12:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
12:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
11:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1006.wikimedia.org on all recursors
11:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1006.wikimedia.org on all recursors
11:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
11:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1006.wikimedia.org - jmm@cumin2002"
11:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:55 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1006.wikimedia.org
11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-replica1005.wikimedia.org
11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-replica1005.wikimedia.org with OS bookworm
11:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T343198)', diff saved to https://phabricator.wikimedia.org/P52337 and previous config saved to /var/cache/conftool/dbconfig/20230908-114911-arnaudb.json
11:49 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
11:48 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
11:48 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52336 and previous config saved to /var/cache/conftool/dbconfig/20230908-114850-arnaudb.json
11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-replica1005.wikimedia.org with reason: host reimage
11:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-replica1005.wikimedia.org with reason: host reimage
11:33 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P52335 and previous config saved to /var/cache/conftool/dbconfig/20230908-113344-arnaudb.json
11:23 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-replica1005.wikimedia.org with OS bookworm
11:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
11:21 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-replica1005.wikimedia.org on all recursors
11:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-replica1005.wikimedia.org on all recursors
11:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
11:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-replica1005.wikimedia.org - jmm@cumin2002"
11:18 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P52334 and previous config saved to /var/cache/conftool/dbconfig/20230908-111838-arnaudb.json
11:17 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:17 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-replica1005.wikimedia.org
11:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
11:14 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
11:07 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
11:07 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-rw2001.wikimedia.org with OS bookworm
11:05 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
11:04 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
11:03 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52333 and previous config saved to /var/cache/conftool/dbconfig/20230908-110331-arnaudb.json
10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw2001.wikimedia.org with reason: host reimage
10:33 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:32 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-rw2001.wikimedia.org with OS bookworm
10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-rw1001.wikimedia.org with OS bookworm
10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
10:07 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-rw1001.wikimedia.org with reason: host reimage
10:05 jbond@cumin1001: START - Cookbook sre.dns.netbox
09:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-rw1001.wikimedia.org with OS bookworm
09:46 vgutierrez: restart fifo-log-demux@notpurge.service in cp4052
09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
09:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
09:31 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts furud.codfw.wmnet
09:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: furud.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
09:29 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: furud.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
09:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
09:22 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts furud.codfw.wmnet
09:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
09:16 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
09:15 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
09:13 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
09:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:11 jmm@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
09:10 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:00 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
08:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
08:23 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
08:09 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
08:06 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
08:01 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
07:59 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52328 and previous config saved to /var/cache/conftool/dbconfig/20230908-075901-arnaudb.json
07:58 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
07:58 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
07:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T343198)', diff saved to https://phabricator.wikimedia.org/P52327 and previous config saved to /var/cache/conftool/dbconfig/20230908-075840-arnaudb.json
07:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P52326 and previous config saved to /var/cache/conftool/dbconfig/20230908-074334-arnaudb.json
07:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P52325 and previous config saved to /var/cache/conftool/dbconfig/20230908-072828-arnaudb.json
07:27 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:26 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
07:26 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:25 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:25 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
07:25 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
07:24 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
07:24 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
07:24 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
07:23 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
07:23 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
07:23 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
07:22 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
07:22 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
07:22 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
07:21 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
07:21 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
07:21 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
07:21 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
07:20 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
07:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T343198)', diff saved to https://phabricator.wikimedia.org/P52324 and previous config saved to /var/cache/conftool/dbconfig/20230908-071322-arnaudb.json
07:02 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
05:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
05:23 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
04:57 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
04:54 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
04:29 ryankemper@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
04:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T343198)', diff saved to https://phabricator.wikimedia.org/P52323 and previous config saved to /var/cache/conftool/dbconfig/20230908-042821-arnaudb.json
04:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
04:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
04:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52322 and previous config saved to /var/cache/conftool/dbconfig/20230908-042800-arnaudb.json
04:12 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P52321 and previous config saved to /var/cache/conftool/dbconfig/20230908-041254-arnaudb.json
03:57 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P52320 and previous config saved to /var/cache/conftool/dbconfig/20230908-035747-arnaudb.json
03:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52319 and previous config saved to /var/cache/conftool/dbconfig/20230908-034241-arnaudb.json
00:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52318 and previous config saved to /var/cache/conftool/dbconfig/20230908-005323-arnaudb.json
00:53 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
00:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
00:53 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T343198)', diff saved to https://phabricator.wikimedia.org/P52317 and previous config saved to /var/cache/conftool/dbconfig/20230908-005301-arnaudb.json
00:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P52316 and previous config saved to /var/cache/conftool/dbconfig/20230908-003755-arnaudb.json
00:23 eileen: civicrm upgraded from e81ed4e9 to de883cd5
00:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P52315 and previous config saved to /var/cache/conftool/dbconfig/20230908-002248-arnaudb.json
00:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T343198)', diff saved to https://phabricator.wikimedia.org/P52314 and previous config saved to /var/cache/conftool/dbconfig/20230908-000742-arnaudb.json
00:03 eileen: civicrm upgraded from 5a432b1e to e81ed4e9

2023-09-07

23:12 ejegg: payments-wiki upgraded from 639a8d6a to c524f53f
22:45 jhuneidi@deploy1002: Installation of scap version "4.59.0" completed for 594 hosts
22:44 jhuneidi@deploy1002: Installing scap version "4.59.0" for 594 hosts
22:30 jhuneidi@deploy1002: Installing scap version "4.59.0" for 595 hosts
22:29 jeena: installing scap v4.59.0
22:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
21:47 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T343198)', diff saved to https://phabricator.wikimedia.org/P52313 and previous config saved to /var/cache/conftool/dbconfig/20230907-214717-arnaudb.json
21:47 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
21:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
21:46 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
21:46 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
21:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T343198)', diff saved to https://phabricator.wikimedia.org/P52312 and previous config saved to /var/cache/conftool/dbconfig/20230907-214640-arnaudb.json
21:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P52311 and previous config saved to /var/cache/conftool/dbconfig/20230907-213134-arnaudb.json
21:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P52310 and previous config saved to /var/cache/conftool/dbconfig/20230907-211628-arnaudb.json
21:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T343198)', diff saved to https://phabricator.wikimedia.org/P52309 and previous config saved to /var/cache/conftool/dbconfig/20230907-210122-arnaudb.json
20:56 thcipriani@deploy1002: Finished scap: Backport for Preserve Gadget prefs when they can't be enabled (T341421), Fix settings button not working on reference previews (T345829) (duration: 11m 12s)
20:50 thcipriani@deploy1002: jdlrobson and thcipriani: Continuing with sync
20:49 taavi@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2444.codfw.wmnet
20:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
20:46 thcipriani@deploy1002: jdlrobson and thcipriani: Backport for Preserve Gadget prefs when they can't be enabled (T341421), Fix settings button not working on reference previews (T345829) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD o
20:46 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
20:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
20:45 thcipriani@deploy1002: Started scap: Backport for Preserve Gadget prefs when they can't be enabled (T341421), Fix settings button not working on reference previews (T345829)
20:41 thcipriani@deploy1002: Finished scap: Backport for Pre-deploy Reader Demographics 2 pilot survey (T344393) (duration: 10m 59s)
20:33 thcipriani@deploy1002: dani and thcipriani: Continuing with sync
20:31 thcipriani@deploy1002: dani and thcipriani: Backport for Pre-deploy Reader Demographics 2 pilot survey (T344393) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:30 thcipriani@deploy1002: Started scap: Backport for Pre-deploy Reader Demographics 2 pilot survey (T344393)
20:23 thcipriani@deploy1002: Finished scap: Backport for Undeploy Campaigns Event Discovery survey (T345158) (duration: 17m 58s)
20:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
20:11 thcipriani@deploy1002: thcipriani and dani: Continuing with sync
20:07 thcipriani@deploy1002: thcipriani and dani: Backport for Undeploy Campaigns Event Discovery survey (T345158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:05 thcipriani@deploy1002: Started scap: Backport for Undeploy Campaigns Event Discovery survey (T345158)
19:41 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
19:38 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
19:37 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
19:33 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
19:13 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
18:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1010.eqiad.wmnet with reason: T342361
18:49 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1010.eqiad.wmnet with reason: T342361
18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T343198)', diff saved to https://phabricator.wikimedia.org/P52308 and previous config saved to /var/cache/conftool/dbconfig/20230907-183153-arnaudb.json
18:31 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
18:31 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
18:31 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52307 and previous config saved to /var/cache/conftool/dbconfig/20230907-183132-arnaudb.json
18:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52306 and previous config saved to /var/cache/conftool/dbconfig/20230907-181626-arnaudb.json
18:01 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52305 and previous config saved to /var/cache/conftool/dbconfig/20230907-180120-arnaudb.json
17:46 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52304 and previous config saved to /var/cache/conftool/dbconfig/20230907-174613-arnaudb.json
17:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52303 and previous config saved to /var/cache/conftool/dbconfig/20230907-174351-arnaudb.json
17:43 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
17:43 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
16:45 Amir1: running moveToExternal on all wikis
15:58 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lists1004.eqiad.wmnet']
15:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lists1004.eqiad.wmnet']
15:47 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
15:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2053.codfw.wmnet
15:47 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1053.eqiad.wmnet
15:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1053.eqiad.wmnet
15:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2053.codfw.wmnet
15:37 jclark@cumin1001: START - Cookbook sre.hosts.provision for host lists1004.mgmt.eqiad.wmnet with reboot policy FORCED
15:33 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lists1004
15:32 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lists1004
15:28 jclark@cumin1001: START - Cookbook sre.dns.netbox
15:20 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:19 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:18 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:17 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:13 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:13 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:11 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
15:11 filippo@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
14:54 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
14:53 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
14:48 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2052.codfw.wmnet
14:46 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1052.eqiad.wmnet
14:42 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1052.eqiad.wmnet
14:41 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2052.codfw.wmnet
14:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1136.eqiad.wmnet with OS bullseye
14:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
14:30 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
14:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
14:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
14:30 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
14:30 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
14:28 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
14:27 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
14:27 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
14:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbstore1009.mgmt.eqiad.wmnet with reboot policy FORCED
14:26 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbstore1008.mgmt.eqiad.wmnet with reboot policy FORCED
14:24 bking@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
14:24 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
14:23 bking@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
14:23 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
14:22 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
14:20 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
14:19 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1135.eqiad.wmnet with OS bullseye
14:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1051.eqiad.wmnet
14:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2051.codfw.wmnet
14:15 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
14:14 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
14:13 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1136.eqiad.wmnet with reason: host reimage
14:12 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
14:11 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
14:10 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2051.codfw.wmnet
14:10 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1051.eqiad.wmnet
14:10 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1136.eqiad.wmnet with reason: host reimage
14:03 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
14:02 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
13:58 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
13:58 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
13:58 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
13:58 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
13:58 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
13:57 bking@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
13:56 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1135.eqiad.wmnet with reason: host reimage
13:56 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1136.eqiad.wmnet with OS bullseye
13:53 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1135.eqiad.wmnet with reason: host reimage
13:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbstore1009.mgmt.eqiad.wmnet with reboot policy FORCED
13:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbstore1008.mgmt.eqiad.wmnet with reboot policy FORCED
13:51 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbstore1009
13:51 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbstore1008
13:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbstore1009
13:51 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbstore1008
13:50 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:50 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbstore100{8..9} - jclark@cumin1001"
13:50 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbstore100{8..9} - jclark@cumin1001"
13:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:43 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pki1002.eqiad.wmnet']
13:40 XioNoX: trunk sandbox vlan to ganeti nodes in esams BY27 - T307021
13:40 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1135.eqiad.wmnet with OS bullseye
13:38 taavi: taavi@mwmaint1002 ~ $ mwscript extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php --wiki=labswiki | tee oathauth-multiple-labswiki.log # T242031
13:38 taavi@deploy1002: Finished scap: Backport for Set OATHAuth multiple devices READ_NEW for all fishbows, privates (T242031), Set OATHAuth multiple devices WRITE_BOTH for wikitech (T242031) (duration: 08m 52s)
13:35 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki1002.eqiad.wmnet']
13:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki1002.mgmt.eqiad.wmnet with reboot policy FORCED
13:31 taavi@deploy1002: taavi: Continuing with sync
13:30 taavi@deploy1002: taavi: Backport for Set OATHAuth multiple devices READ_NEW for all fishbows, privates (T242031), Set OATHAuth multiple devices WRITE_BOTH for wikitech (T242031) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:29 taavi@deploy1002: Started scap: Backport for Set OATHAuth multiple devices READ_NEW for all fishbows, privates (T242031), Set OATHAuth multiple devices WRITE_BOTH for wikitech (T242031)
13:27 taavi@deploy1002: Finished scap: Backport for Edit check: Turn on when ecenable=1 is set (T345297) (duration: 09m 46s)
13:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host pki1002.mgmt.eqiad.wmnet with reboot policy FORCED
13:22 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pki1002
13:21 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host pki1002
13:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:21 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pki1002 - jclark@cumin1001"
13:20 taavi@deploy1002: taavi and kemayo: Continuing with sync
13:20 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt pki1002 - jclark@cumin1001"
13:18 taavi@deploy1002: taavi and kemayo: Backport for Edit check: Turn on when ecenable=1 is set (T345297) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:18 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts atlas2001.wikimedia.org
13:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
13:17 taavi@deploy1002: Started scap: Backport for Edit check: Turn on when ecenable=1 is set (T345297)
13:14 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: atlas2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1001"
13:12 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
13:08 ayounsi@cumin1001: START - Cookbook sre.hosts.decommission for hosts atlas2001.wikimedia.org
12:35 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
12:34 filippo@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
12:23 filippo@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
12:23 filippo@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
12:04 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
12:04 claime: Starting eqiad jobrunner reboots
12:02 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
12:02 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
11:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
11:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2050.codfw.wmnet
11:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1050.eqiad.wmnet
11:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1050.eqiad.wmnet
11:22 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2050.codfw.wmnet
11:13 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
11:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
11:10 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:09 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
11:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
11:04 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
10:56 urbanecm: mwmaint1002: `/usr/local/bin/mw-cli-wrapper /usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=1second --verbose --use-job-queue` (T344428, testing with r955319 deployed)
10:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
10:54 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
10:51 ladsgroup@deploy1002: Finished scap: Backport for Pin pagelinks normalization stage to old in production (T345732) (duration: 09m 05s)
10:46 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
10:45 ladsgroup@deploy1002: ladsgroup: Continuing with sync
10:44 ladsgroup@deploy1002: ladsgroup: Backport for Pin pagelinks normalization stage to old in production (T345732) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
10:42 ladsgroup@deploy1002: Started scap: Backport for Pin pagelinks normalization stage to old in production (T345732)
10:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1441-1442,1451].eqiad.wmnet
10:35 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
10:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw[1441-1442,1451].eqiad.wmnet
10:35 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
10:33 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
10:32 fnegri@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
10:29 fnegri@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2001-dev.codfw.wmnet with reason: host reimage
10:24 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
10:24 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
10:23 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
10:23 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
10:21 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
10:10 fnegri@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol2001-dev.codfw.wmnet with OS bookworm
10:08 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetserver1002.eqiad.wmnet with OS bookworm
10:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.25 refs T343727
10:03 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1134.eqiad.wmnet with OS bullseye
09:58 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1002.eqiad.wmnet with reason: host reimage
09:55 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1002.eqiad.wmnet with reason: host reimage
09:54 hashar@deploy1002: Finished scap: Backport for RevisionReviewForm: allow setting `null` tag (T345804) (duration: 07m 54s)
09:48 hashar@deploy1002: ladsgroup and hashar: Continuing with sync
09:47 hashar@deploy1002: ladsgroup and hashar: Backport for RevisionReviewForm: allow setting `null` tag (T345804) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
09:46 hashar@deploy1002: Started scap: Backport for RevisionReviewForm: allow setting `null` tag (T345804)
09:41 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1134.eqiad.wmnet with reason: host reimage
09:39 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1002.eqiad.wmnet with OS bookworm
09:38 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1133.eqiad.wmnet with OS bullseye
09:38 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1134.eqiad.wmnet with reason: host reimage
09:37 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
09:37 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
09:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52300 and previous config saved to /var/cache/conftool/dbconfig/20230907-093718-arnaudb.json
09:24 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1134.eqiad.wmnet with OS bullseye
09:22 moritzm: installing grub2 updates from Bullseye point release
09:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P52299 and previous config saved to /var/cache/conftool/dbconfig/20230907-092212-arnaudb.json
09:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1133.eqiad.wmnet with reason: host reimage
09:14 taavi: foreachwikiindblist private extensions/OATHAuth/maintenance/UpdateForMultipleDevicesSupport.php | tee oathauth-multiple-private.log # T242031
09:12 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1133.eqiad.wmnet with reason: host reimage
09:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P52298 and previous config saved to /var/cache/conftool/dbconfig/20230907-090706-arnaudb.json
08:59 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1133.eqiad.wmnet with OS bullseye
08:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52297 and previous config saved to /var/cache/conftool/dbconfig/20230907-085159-arnaudb.json
08:51 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
08:46 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert group2 wikis to 1.41.0-wmf.24 - T343727
08:38 moritzm: installing librsvg security updates
08:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mc2040.codfw.wmnet with reason: T345802 - hw troubleshooting
08:23 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mc2040.codfw.wmnet with reason: T345802 - hw troubleshooting
08:22 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.25 refs T343727
07:57 moritzm: installing grub2 updates from Bullseye point release
07:40 moritzm: installing file/libmagic security updates
07:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
07:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
07:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
07:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
06:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
06:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
06:29 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52296 and previous config saved to /var/cache/conftool/dbconfig/20230907-062900-arnaudb.json
06:28 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
06:28 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs[1003-1004].eqiad.wmnet with reason: reboot
06:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on wdqs[1003-1004].eqiad.wmnet with reason: reboot
06:28 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
06:28 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T343198)', diff saved to https://phabricator.wikimedia.org/P52295 and previous config saved to /var/cache/conftool/dbconfig/20230907-062838-arnaudb.json
06:13 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P52294 and previous config saved to /var/cache/conftool/dbconfig/20230907-061332-arnaudb.json
05:58 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P52293 and previous config saved to /var/cache/conftool/dbconfig/20230907-055826-arnaudb.json
05:43 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T343198)', diff saved to https://phabricator.wikimedia.org/P52292 and previous config saved to /var/cache/conftool/dbconfig/20230907-054320-arnaudb.json
05:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
05:24 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
03:40 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
03:39 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
03:26 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
03:24 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
03:23 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T343198)', diff saved to https://phabricator.wikimedia.org/P52291 and previous config saved to /var/cache/conftool/dbconfig/20230907-032306-arnaudb.json
03:23 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
03:22 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
03:22 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T343198)', diff saved to https://phabricator.wikimedia.org/P52290 and previous config saved to /var/cache/conftool/dbconfig/20230907-032245-arnaudb.json
03:07 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P52289 and previous config saved to /var/cache/conftool/dbconfig/20230907-030739-arnaudb.json
02:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P52288 and previous config saved to /var/cache/conftool/dbconfig/20230907-025233-arnaudb.json
02:37 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T343198)', diff saved to https://phabricator.wikimedia.org/P52287 and previous config saved to /var/cache/conftool/dbconfig/20230907-023727-arnaudb.json
01:10 tstarling@deploy1002: Synchronized php-1.41.0-wmf.25/extensions/Phonos/extension.json: fix breakage of Phonos on parser-cached pages T345414 (duration: 06m 59s)
00:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T343198)', diff saved to https://phabricator.wikimedia.org/P52286 and previous config saved to /var/cache/conftool/dbconfig/20230907-003038-arnaudb.json
00:30 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
00:30 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
00:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T343198)', diff saved to https://phabricator.wikimedia.org/P52285 and previous config saved to /var/cache/conftool/dbconfig/20230907-003017-arnaudb.json
00:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P52284 and previous config saved to /var/cache/conftool/dbconfig/20230907-001510-arnaudb.json
00:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P52283 and previous config saved to /var/cache/conftool/dbconfig/20230907-000004-arnaudb.json

2023-09-06

23:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T343198)', diff saved to https://phabricator.wikimedia.org/P52282 and previous config saved to /var/cache/conftool/dbconfig/20230906-234458-arnaudb.json
22:10 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host flink-zk2003.codfw.wmnet
22:10 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk2003.codfw.wmnet with OS bookworm
21:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk2003.codfw.wmnet with reason: host reimage
21:53 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk2003.codfw.wmnet with reason: host reimage
21:44 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
21:42 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T343198)', diff saved to https://phabricator.wikimedia.org/P52281 and previous config saved to /var/cache/conftool/dbconfig/20230906-214205-arnaudb.json
21:42 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
21:41 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
21:41 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T343198)', diff saved to https://phabricator.wikimedia.org/P52280 and previous config saved to /var/cache/conftool/dbconfig/20230906-214145-arnaudb.json
21:40 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1007.eqiad.wmnet with OS bullseye
21:39 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
21:38 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1006.eqiad.wmnet with OS bullseye
21:32 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2003.codfw.wmnet with OS bookworm
21:26 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P52279 and previous config saved to /var/cache/conftool/dbconfig/20230906-212638-arnaudb.json
21:23 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2003.codfw.wmnet - bking@cumin1001"
21:22 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2003.codfw.wmnet - bking@cumin1001"
21:22 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2003.codfw.wmnet on all recursors
21:22 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2003.codfw.wmnet on all recursors
21:22 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:22 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2003.codfw.wmnet - bking@cumin1001"
21:21 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2003.codfw.wmnet - bking@cumin1001"
21:20 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
21:18 bking@cumin1001: START - Cookbook sre.dns.netbox
21:18 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2003.codfw.wmnet
21:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1007.eqiad.wmnet with reason: host reimage
21:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1006.eqiad.wmnet with reason: host reimage
21:11 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P52278 and previous config saved to /var/cache/conftool/dbconfig/20230906-211132-arnaudb.json
21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1007.eqiad.wmnet with reason: host reimage
21:10 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1006.eqiad.wmnet with reason: host reimage
20:58 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1007.eqiad.wmnet with OS bullseye
20:58 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs1006.eqiad.wmnet with OS bullseye
20:56 bking@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host flink-zk2002.codfw.wmnet
20:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk2002.codfw.wmnet with OS bookworm
20:56 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T343198)', diff saved to https://phabricator.wikimedia.org/P52277 and previous config saved to /var/cache/conftool/dbconfig/20230906-205626-arnaudb.json
20:45 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
20:42 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk2002.codfw.wmnet with reason: host reimage
20:40 taavi@deploy1002: Finished scap: Backport for Article: Check permissions before showing link to view deleted revision (T264765), Article: Check permissions before showing link to view deleted revision (T264765), TopicSubscriptionsPager: Handle invalid titles (T345648), TopicSubscriptionsPager: Handle invalid titles (T345648) (duration: 09m 42s)
20:39 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk2002.codfw.wmnet with reason: host reimage
20:34 taavi@deploy1002: matmarex and taavi: Continuing with sync
20:32 taavi@deploy1002: matmarex and taavi: Backport for Article: Check permissions before showing link to view deleted revision (T264765), Article: Check permissions before showing link to view deleted revision (T264765), TopicSubscriptionsPager: Handle invalid titles (T345648), TopicSubscriptionsPager: Handle invalid titles (T345648) synced to the tes
20:30 taavi@deploy1002: Started scap: Backport for Article: Check permissions before showing link to view deleted revision (T264765), Article: Check permissions before showing link to view deleted revision (T264765), TopicSubscriptionsPager: Handle invalid titles (T345648), TopicSubscriptionsPager: Handle invalid titles (T345648)
20:30 taavi@deploy1002: Finished scap: Backport for Add wikispecies logo (T341252), Disable wordmark on Gothic Wikipedia (T341253), Wikimania logos and taglines (T341254) (duration: 14m 25s)
20:24 taavi@deploy1002: jdlrobson and taavi: Continuing with sync
20:17 taavi@deploy1002: jdlrobson and taavi: Backport for Add wikispecies logo (T341252), Disable wordmark on Gothic Wikipedia (T341253), Wikimania logos and taglines (T341254) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XW
20:15 taavi@deploy1002: Started scap: Backport for Add wikispecies logo (T341252), Disable wordmark on Gothic Wikipedia (T341253), Wikimania logos and taglines (T341254)
20:14 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2002.codfw.wmnet with OS bookworm
20:14 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2002.codfw.wmnet - bking@cumin1001"
20:14 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2002.codfw.wmnet - bking@cumin1001"
20:13 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2002.codfw.wmnet on all recursors
20:13 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2002.codfw.wmnet on all recursors
20:13 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:13 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2002.codfw.wmnet - bking@cumin1001"
20:13 taavi@deploy1002: Finished scap: Backport for GrowthExperiments: enable add a link in 12 and 13th round of wikis (T308137 T308138) (duration: 10m 16s)
20:12 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2002.codfw.wmnet - bking@cumin1001"
20:10 bking@cumin1001: START - Cookbook sre.dns.netbox
20:10 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2002.codfw.wmnet
20:07 taavi@deploy1002: taavi and sgimeno: Continuing with sync
20:04 taavi@deploy1002: taavi and sgimeno: Backport for GrowthExperiments: enable add a link in 12 and 13th round of wikis (T308137 T308138) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:03 taavi@deploy1002: Started scap: Backport for GrowthExperiments: enable add a link in 12 and 13th round of wikis (T308137 T308138)
19:18 hmonroy@deploy1002: Finished scap: Backport for Delay loading ext.phonos module until user clicks (T345414) (duration: 07m 58s)
19:12 hmonroy@deploy1002: hmonroy and musikanimal: Continuing with sync
19:12 hmonroy@deploy1002: hmonroy and musikanimal: Backport for Delay loading ext.phonos module until user clicks (T345414) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
19:10 hmonroy@deploy1002: Started scap: Backport for Delay loading ext.phonos module until user clicks (T345414)
18:16 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T343198)', diff saved to https://phabricator.wikimedia.org/P52276 and previous config saved to /var/cache/conftool/dbconfig/20230906-181602-arnaudb.json
18:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
18:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
18:15 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
18:15 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
18:00 cmooney@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030']
18:00 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030']
18:00 cmooney@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['restbase1030']
18:00 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030']
17:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['restbase1030']
17:58 cmooney@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030']
17:55 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
17:48 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1132.eqiad.wmnet with OS bullseye
17:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
17:22 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
17:05 brett: Upload libvmod-re2_1.5.3-5_amd64 to bookworm-wikimedia
16:51 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
16:43 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
16:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:42 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove entries for cloudweb2002-dev - cmooney@cumin1001"
16:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS bullseye
16:25 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove entries for cloudweb2002-dev - cmooney@cumin1001"
16:14 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
15:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1387.eqiad.wmnet
15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1385.eqiad.wmnet
15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1373.eqiad.wmnet
15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1364.eqiad.wmnet
15:42 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,service=nginx,name=mw1384.eqiad.wmnet
15:41 cgoubert@cumin1001: END (ERROR) - Cookbook sre.hosts.reboot-cluster (exit_code=97)
15:40 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
15:40 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
15:39 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52275 and previous config saved to /var/cache/conftool/dbconfig/20230906-153957-arnaudb.json
15:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2049.codfw.wmnet
15:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1049.eqiad.wmnet
15:38 akosiaris: sudo ethtool -G eno1 rx 1000 on conf2005, conf2006 to test out the theory. T345738
15:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2049.codfw.wmnet
15:32 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1049.eqiad.wmnet
15:24 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P52274 and previous config saved to /var/cache/conftool/dbconfig/20230906-152451-arnaudb.json
15:09 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P52273 and previous config saved to /var/cache/conftool/dbconfig/20230906-150945-arnaudb.json
15:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['moss-be2003']
15:06 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['moss-be2003']
15:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1048.eqiad.wmnet
15:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2048.codfw.wmnet
14:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2048.codfw.wmnet
14:59 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1048.eqiad.wmnet
14:54 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52272 and previous config saved to /var/cache/conftool/dbconfig/20230906-145439-arnaudb.json
14:52 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
14:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
14:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-be2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-be2003.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:31 claime: Repooling mw1349.eqiad.wmnet - T345741
14:22 claime: Leaving mw1349.eqiad.wmnet pooled=invalid until management interface investigation - T345741
14:18 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
14:18 claime: Restarting appserver reboots
13:59 claime: repooling mw1351.eqiad.wmnet
13:57 claime: powercycling mw1349.eqiad.wmnet
13:54 claime: powercycling mw1351.eqiad.wmnet
13:53 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1351.eqiad.wmnet
13:53 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=mw1349.eqiad.wmnet
13:42 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1047.eqiad.wmnet
13:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2047.codfw.wmnet
13:38 akosiaris: sudo ethtool -G eno1 rx 1000 on conf2004 T345738
13:38 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=1)
13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2047.codfw.wmnet
13:36 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1047.eqiad.wmnet
13:33 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
13:21 sukhe: homer "asw1-b*27-esams*" commit "add durum300[34]"
13:21 taavi: taavi@mwmaint1002 ~ $ cat logos-to-purge.txt | mwscript purgeList.php --wiki enwiki # T345666
13:21 taavi@deploy1002: Finished scap: Backport for bnwikisource: update legacy vector logo (T345666) (duration: 17m 35s)
13:20 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
13:20 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
13:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2001.codfw.wmnet on all recursors
13:19 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
13:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:19 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
13:18 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
13:16 bking@cumin1001: START - Cookbook sre.dns.netbox
13:16 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
13:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-be2003.codfw.wmnet with OS bullseye
13:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bullseye
13:07 taavi@deploy1002: taavi and anzx: Continuing with sync
13:05 taavi@deploy1002: taavi and anzx: Backport for bnwikisource: update legacy vector logo (T345666) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:03 taavi@deploy1002: Started scap: Backport for bnwikisource: update legacy vector logo (T345666)
12:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:11 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:11 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
12:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2046.codfw.wmnet
12:08 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1046.eqiad.wmnet
12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T343198)', diff saved to https://phabricator.wikimedia.org/P52270 and previous config saved to /var/cache/conftool/dbconfig/20230906-120448-arnaudb.json
12:04 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
12:04 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
12:04 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52269 and previous config saved to /var/cache/conftool/dbconfig/20230906-120427-arnaudb.json
12:03 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1046.eqiad.wmnet
12:03 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2046.codfw.wmnet
11:49 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P52268 and previous config saved to /var/cache/conftool/dbconfig/20230906-114921-arnaudb.json
11:34 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P52267 and previous config saved to /var/cache/conftool/dbconfig/20230906-113414-arnaudb.json
11:33 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1045.eqiad.wmnet
11:32 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
11:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
11:27 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1045.eqiad.wmnet
11:19 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52266 and previous config saved to /var/cache/conftool/dbconfig/20230906-111908-arnaudb.json
11:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2044.codfw.wmnet
11:05 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1044.eqiad.wmnet
10:58 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2044.codfw.wmnet
10:58 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1044.eqiad.wmnet
10:35 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2043.codfw.wmnet
10:34 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1043.eqiad.wmnet
10:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2043.codfw.wmnet
10:28 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1043.eqiad.wmnet
10:27 topranks: Resetting PIC 1/1 on cr2-codfw to enable et-1/1/5 at 100G (T345583)
10:15 topranks: shut cr2-codfw xe-1/1/1:3 interface to cr1-codfw ahead of card 1/1 reset (T345583)
10:08 topranks: Draining cr2-codfw transport cct's to eqdfw and eqiad prior to card 1/1 reset (T345583)
09:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
09:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
09:57 topranks: de-activating peering sessions at DE-CIX Dallas on cr2-codfw prior to card 1/1 reset (T345583)
09:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
09:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
09:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ganeti-test01.svc.eqiad.wmnet on all recursors
09:51 ayounsi@cumin1001: START - Cookbook sre.dns.wipe-cache ganeti-test01.svc.eqiad.wmnet on all recursors
09:49 topranks: Making cr1-codfw VRRP primary for connections to row C and D prior to card 1/1 reset (T345583)
09:49 jbond: enable puppet post switch puppetdbs gerrit:954622
09:28 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
09:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
09:26 jbond: disable puppet to switch puppetdbs gerrit:954622
09:23 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
09:23 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
09:23 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
09:23 topranks: Resetting PIC 1/1 on cr1-codfw to enable port et-1/1/5 at 100G (T345583)
09:23 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
09:15 topranks: Shutting cr1-codfw port xe-1/1/1:1 to cr2-codfw before card 1/1 reset (T345583)
09:05 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T343198)', diff saved to https://phabricator.wikimedia.org/P52265 and previous config saved to /var/cache/conftool/dbconfig/20230906-090541-arnaudb.json
09:05 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
09:05 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
09:05 topranks: Draining transport circuits landing on cr1-codfw card 1/1 prior to reset (T345583)
08:58 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2001.wikimedia.org
08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2001.wikimedia.org
08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2001.wikimedia.org
08:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2001.wikimedia.org
08:25 hashar@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.25 refs T343727 (duration: 06m 31s)
08:18 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.25 refs T343727
07:51 filippo@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
07:51 filippo@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
07:21 kartik@deploy1002: Finished scap: Backport for Enable MinT translation service in more wikis - rollout #2 (T341445) (duration: 11m 05s)
07:15 kartik@deploy1002: abi and kartik: Continuing with sync
07:11 kartik@deploy1002: abi and kartik: Backport for Enable MinT translation service in more wikis - rollout #2 (T341445) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:10 kartik@deploy1002: Started scap: Backport for Enable MinT translation service in more wikis - rollout #2 (T341445)
05:28 tstarling@deploy1002: Synchronized php-1.41.0-wmf.25/extensions/Phonos: Fix UBN client-side error from malformed Phonos tags T345672 (duration: 06m 51s)
04:07 eileen: civicrm upgraded from a6fd7d6b to 5a432b1e

2023-09-05

23:44 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk2001.codfw.wmnet
23:44 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:44 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
23:37 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
23:34 bking@cumin1001: START - Cookbook sre.dns.netbox
23:30 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk2001.codfw.wmnet
22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update DNS entries for kubernetes2029 and 2030 - pt1979@cumin2002"
22:57 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update DNS entries for kubernetes2029 and 2030 - pt1979@cumin2002"
22:55 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:22 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
22:11 urbanecm: mwmaint1002: `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --batch-size=20 --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=6hour --verbose --use-job-queue ` (debugging T344428, lowered batch size [100 -> 20])
21:38 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.25 refs T343727
21:38 urbanecm: mwmaint1002: `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/growthexperiments.dblist extensions/GrowthExperiments/maintenance/refreshUserImpactData.php --registeredWithin=1year --editedWithin=2week --hasEditsAtLeast=3 --ignoreIfUpdatedWithin=6hour --verbose --use-job-queue` (trying to reproduce T344428)
21:34 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
21:28 sbassett: Deployed updated security mitigation for T336027
21:28 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
21:21 cjming@deploy1002: Finished scap: Backport for Fix unseen notifications icon (T345483) (duration: 13m 46s)
21:16 eevans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host restbase1030.eqiad.wmnet with OS bullseye
21:16 cjming: end of UTC late backport window
21:15 cjming@deploy1002: jdlrobson and cjming: Continuing with sync
21:12 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host restbase1030.eqiad.wmnet with OS bullseye
21:09 cjming@deploy1002: jdlrobson and cjming: Backport for Fix unseen notifications icon (T345483) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
21:09 eileen: config revision changed from c2f91f49 to e1c3b7fd
21:08 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
21:07 cjming@deploy1002: Started scap: Backport for Fix unseen notifications icon (T345483)
20:49 cjming@deploy1002: Finished scap: Backport for Fix unseen notifications icon (T345483) (duration: 16m 45s)
20:43 cjming@deploy1002: cjming and jdlrobson: Continuing with sync
20:34 cjming@deploy1002: cjming and jdlrobson: Backport for Fix unseen notifications icon (T345483) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:33 cjming@deploy1002: Started scap: Backport for Fix unseen notifications icon (T345483)
20:32 cjming@deploy1002: Finished scap: Backport for Fix temp user popup appearing on every new page creation (T345569) (duration: 11m 37s)
20:26 cjming@deploy1002: cjming and matmarex: Continuing with sync
20:22 cjming@deploy1002: cjming and matmarex: Backport for Fix temp user popup appearing on every new page creation (T345569) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:20 cjming@deploy1002: Started scap: Backport for Fix temp user popup appearing on every new page creation (T345569)
20:17 cjming@deploy1002: Finished scap: Backport for Deploy Campaigns Event Discovery survey (T345158) (duration: 10m 27s)
20:11 cjming@deploy1002: cjming and dani: Continuing with sync
20:09 fab@deploy1002: Finished deploy [airflow-dags/research@90f280e]: (no justification provided) (duration: 00m 17s)
20:09 fab@deploy1002: Started deploy [airflow-dags/research@90f280e]: (no justification provided)
20:08 cjming@deploy1002: cjming and dani: Backport for Deploy Campaigns Event Discovery survey (T345158) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:07 cjming@deploy1002: Started scap: Backport for Deploy Campaigns Event Discovery survey (T345158)
19:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bookworm
19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:18 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
19:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
19:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
18:59 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
18:59 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
18:52 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh1001.wikimedia.org with OS bookworm
18:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
18:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
18:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
18:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
18:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kubernetes2029.mgmt.codfw.wmnet with reboot policy GRACEFUL
18:18 topranks: Running authdns-update to add includes for newly assigned codfw subnets
18:03 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host kubernetes2029.mgmt.codfw.wmnet with reboot policy GRACEFUL
17:57 dcausse: T345545: triggered a manual dag run to import analytics_platform_eng.image_suggestions_search_index_full/snapshot=2023-08-21
17:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2039.codfw.wmnet with OS bullseye
17:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2038.codfw.wmnet with OS bullseye
17:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:47 dcausse@deploy1002: Finished deploy [airflow-dags/search@b3d43bb]: T345545: search: generalize image_suggestions_manual (duration: 00m 26s)
17:47 dcausse@deploy1002: Started deploy [airflow-dags/search@b3d43bb]: T345545: search: generalize image_suggestions_manual
17:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bookworm
17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2039.codfw.wmnet with reason: host reimage
17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2038.codfw.wmnet with reason: host reimage
17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P52263 and previous config saved to /var/cache/conftool/dbconfig/20230905-173132-ladsgroup.json
17:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2039.codfw.wmnet with reason: host reimage
17:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2038.codfw.wmnet with reason: host reimage
17:21 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2037.codfw.wmnet with OS bullseye
17:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:18 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P52262 and previous config saved to /var/cache/conftool/dbconfig/20230905-171627-ladsgroup.json
17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:12 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:10 pt1979@cumin2002: START - Cookbook sre.dns.netbox
17:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2039.codfw.wmnet with OS bullseye
17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2036.codfw.wmnet with OS bullseye
17:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
17:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
17:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2037.codfw.wmnet with reason: host reimage
17:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
17:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
17:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2038.codfw.wmnet with OS bullseye
17:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2038.codfw.wmnet with OS bullseye
17:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2035.codfw.wmnet with OS bullseye
17:03 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2034.codfw.wmnet with OS bullseye
17:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:02 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
17:02 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2037.codfw.wmnet with reason: host reimage
17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P52260 and previous config saved to /var/cache/conftool/dbconfig/20230905-170122-ladsgroup.json
16:54 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mc2042.codfw.wmnet
16:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
16:49 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1042.eqiad.wmnet
16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh1002.wikimedia.org with OS bookworm
16:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1137 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P52259 and previous config saved to /var/cache/conftool/dbconfig/20230905-164618-ladsgroup.json
16:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1042.eqiad.wmnet
16:43 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2042.codfw.wmnet
16:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2037.codfw.wmnet with OS bullseye
16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2033.codfw.wmnet with OS bullseye
16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:37 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2035.codfw.wmnet with reason: host reimage
16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2034.codfw.wmnet with reason: host reimage
16:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1041.eqiad.wmnet
16:31 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2035.codfw.wmnet with reason: host reimage
16:30 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2034.codfw.wmnet with reason: host reimage
16:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2036.codfw.wmnet with OS bullseye
16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2030.codfw.wmnet with OS bullseye
16:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
16:25 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1041.eqiad.wmnet
16:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2033.codfw.wmnet with reason: host reimage
16:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2033.codfw.wmnet with reason: host reimage
16:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2030.codfw.wmnet with reason: host reimage
16:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2035.codfw.wmnet with OS bullseye
16:08 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2034.codfw.wmnet with OS bullseye
16:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2030.codfw.wmnet with reason: host reimage
16:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2031.codfw.wmnet with OS bullseye
16:07 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2032.codfw.wmnet with OS bullseye
16:06 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:00 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
16:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2033.codfw.wmnet with OS bullseye
15:49 claime: Repooled mw2448.eqiad.wmnet - T345597
15:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2032.codfw.wmnet with reason: host reimage
15:45 claime: Repooling mw2448.eqiad.wmnet
15:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2031.codfw.wmnet with reason: host reimage
15:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2032.codfw.wmnet with reason: host reimage
15:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2031.codfw.wmnet with reason: host reimage
15:36 kamila_: Datacenter switchover live test completed (T345588)
15:35 kamila@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover Live Test - T345588 (duration: 30m 45s)
15:34 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
15:28 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
15:28 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
15:27 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
15:27 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
15:25 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
15:25 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
15:25 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
15:25 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
15:25 kamila@cumin1001: [DRY-RUN] MediaWiki read-only period ends at: 2023-09-05 15:25:15.979250
15:25 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite
15:24 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
15:24 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
15:24 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0)
15:24 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki
15:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2032.codfw.wmnet with OS bullseye
15:21 kamila@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=99)
15:20 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
15:20 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0)
15:19 kamila@cumin1001: [DRY-RUN] MediaWiki read-only period starts at: 2023-09-05 15:19:50.101327
15:19 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
15:19 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
15:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2031.codfw.wmnet with OS bullseye
15:19 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
15:19 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
15:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2030.codfw.wmnet with OS bullseye
15:13 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
15:13 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
15:13 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
15:13 kamila@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
15:13 kamila@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
15:10 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
15:04 kamila@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover Live Test - T345588
14:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3004.esams.wmnet with OS bookworm
14:50 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: Datacenter Switchover Live test - T345588
14:33 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2032.codfw.wmnet with OS bullseye
14:32 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply security updates - bking@cumin1001 - T344587
14:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testreduce1002.eqiad.wmnet with OS bookworm
14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test vip - ayounsi@cumin1001"
14:28 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test vip - ayounsi@cumin1001"
14:26 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
14:26 kamila@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: Datacenter Switchover Live test - T345588
14:26 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
14:25 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2031.codfw.wmnet with OS bullseye
14:25 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in codfw: Datacenter Switchover Live test - T345588
14:25 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
14:24 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
14:24 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
14:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
14:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
14:21 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
14:21 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
14:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2030.codfw.wmnet with OS bullseye
14:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1040.eqiad.wmnet
14:16 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
14:16 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3004.esams.wmnet with reason: host reimage
14:15 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
14:15 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2029.codfw.wmnet with OS bullseye
14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3004.esams.wmnet with reason: host reimage
14:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
14:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1040.eqiad.wmnet
14:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testreduce1002.eqiad.wmnet with reason: host reimage
14:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testreduce1002.eqiad.wmnet with reason: host reimage
14:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
14:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3003.esams.wmnet with reason: host reimage
14:01 kamila@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Switchover Live test - T345588
13:57 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testreduce1002.eqiad.wmnet with OS bookworm
13:52 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lift wing for most wikis (T342115) (duration: 18m 33s)
13:46 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
13:45 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2032.codfw.wmnet with OS bullseye
13:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
13:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum3004.esams.wmnet with OS bookworm
13:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum3003.esams.wmnet with OS bookworm
13:40 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
13:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2031.codfw.wmnet with OS bullseye
13:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2026.codfw.wmnet with OS bullseye
13:36 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:35 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: enable lift wing for most wikis (T342115) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:33 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lift wing for most wikis (T342115)
13:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS bullseye
13:30 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52258 and previous config saved to /var/cache/conftool/dbconfig/20230905-133046-arnaudb.json
13:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2028.codfw.wmnet with OS bullseye
13:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:24 taavi@deploy1002: Finished scap: Backport for tlywiki: add metanamespace , timezone, sitename (T345316), tlywiki: Add logos (T345316) (duration: 10m 18s)
13:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2030.codfw.wmnet with OS bullseye
13:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2027.codfw.wmnet with OS bullseye
13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2029.codfw.wmnet with OS bullseye
13:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes2025.codfw.wmnet with OS bullseye
13:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2026.codfw.wmnet with reason: host reimage
13:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
13:18 taavi@deploy1002: taavi and anzx: Continuing with sync
13:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test1002.eqiad.wmnet with OS bullseye
13:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
13:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2026.codfw.wmnet with reason: host reimage
13:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
13:15 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52257 and previous config saved to /var/cache/conftool/dbconfig/20230905-131540-arnaudb.json
13:15 taavi@deploy1002: taavi and anzx: Backport for tlywiki: add metanamespace , timezone, sitename (T345316), tlywiki: Add logos (T345316) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:13 taavi@deploy1002: Started scap: Backport for tlywiki: add metanamespace , timezone, sitename (T345316), tlywiki: Add logos (T345316)
13:13 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
13:12 taavi@deploy1002: Finished scap: Backport for Disable EchoMail and EchoInteraction instruments (T344167) (duration: 10m 14s)
13:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2028.codfw.wmnet with OS bullseye
13:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
13:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes2025.codfw.wmnet with reason: host reimage
13:09 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
13:08 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
13:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
13:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1039.eqiad.wmnet
13:07 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
13:07 taavi@deploy1002: taavi and phuedx: Continuing with sync
13:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2025.codfw.wmnet with reason: host reimage
13:06 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
13:06 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2026.codfw.wmnet with OS bullseye
13:06 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
13:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
13:04 taavi@deploy1002: taavi and phuedx: Backport for Disable EchoMail and EchoInteraction instruments (T344167) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test1001.eqiad.wmnet with OS bullseye
13:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
13:02 taavi@deploy1002: Started scap: Backport for Disable EchoMail and EchoInteraction instruments (T344167)
13:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1039.eqiad.wmnet
13:01 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
13:00 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P52254 and previous config saved to /var/cache/conftool/dbconfig/20230905-130034-arnaudb.json
12:55 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
12:55 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
12:55 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
12:54 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
12:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test1002.eqiad.wmnet with reason: host reimage
12:45 arnaudb@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52252 and previous config saved to /var/cache/conftool/dbconfig/20230905-124528-arnaudb.json
12:44 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS bullseye
12:43 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test1002.eqiad.wmnet with reason: host reimage
12:43 elukey@deploy1002: Finished scap: Backport for Add new OAuth Rate Limiter tier for Wiki Education (T345394) (duration: 07m 49s)
12:37 elukey@deploy1002: elukey: Continuing with sync
12:37 elukey@deploy1002: elukey: Backport for Add new OAuth Rate Limiter tier for Wiki Education (T345394) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
12:37 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1001"
12:35 elukey@deploy1002: Started scap: Backport for Add new OAuth Rate Limiter tier for Wiki Education (T345394)
12:20 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test1002.eqiad.wmnet with OS bullseye
12:18 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
12:18 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
12:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test1001.eqiad.wmnet with reason: host reimage
12:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc1038.eqiad.wmnet
12:17 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
12:16 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
12:16 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
12:14 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test1001.eqiad.wmnet with reason: host reimage
12:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc1038.eqiad.wmnet
12:10 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
11:52 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1003.eqiad.wmnet
11:52 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test1001.eqiad.wmnet with OS bullseye
11:51 ayounsi@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti-test1001.eqiad.wmnet with OS bullseye
11:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
11:45 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1003.eqiad.wmnet
11:45 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1002.eqiad.wmnet
11:40 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1002.eqiad.wmnet
11:40 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1001.eqiad.wmnet
11:36 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2003.codfw.wmnet
11:33 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp1001.eqiad.wmnet
11:30 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2003.codfw.wmnet
11:29 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2002.codfw.wmnet
11:24 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard-next
11:23 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2002.codfw.wmnet
11:22 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2001.codfw.wmnet
11:18 ayounsi@cumin1001: START - Cookbook sre.hosts.reimage for host ganeti-test1001.eqiad.wmnet with OS bullseye
11:15 jiji@cumin1001: START - Cookbook sre.hosts.reboot-single for host mc-gp2001.codfw.wmnet
11:09 kamila@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
11:09 kamila@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
11:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
10:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
10:41 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
10:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
10:36 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
10:34 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
10:33 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
09:52 arnaudb@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T343198)', diff saved to https://phabricator.wikimedia.org/P52247 and previous config saved to /var/cache/conftool/dbconfig/20230905-095254-arnaudb.json
09:52 arnaudb@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
09:52 arnaudb@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
09:49 moritzm: failover ganeti master in esams/BY27 to ganeti3007
09:43 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ganeti-test1001']
09:43 ayounsi@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test1001']
09:41 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti-test1001.mgmt.eqiad.wmnet with reboot policy FORCED
09:26 ayounsi@cumin1001: START - Cookbook sre.hosts.provision for host ganeti-test1001.mgmt.eqiad.wmnet with reboot policy FORCED
09:25 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-test1002
09:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti-test1001
09:20 ayounsi@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-test1001
09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test - ayounsi@cumin1001"
09:16 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: eqiad ganeti-test - ayounsi@cumin1001"
09:14 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
09:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
09:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
09:04 claime: powercycle mw1356.eqiad.wmnet
08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
08:51 jnuche@deploy1002: sync-world aborted: testwikis wikis to 1.41.0-wmf.25 refs T343727 (duration: 20m 37s)
08:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
08:40 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
08:31 jnuche@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.25 refs T343727
08:12 kartik@deploy1002: Finished scap: Backport for Add "editautopatrolprotected" and "editpatrolprotected" protection levels on shwiki (T344306) (duration: 10m 47s)
08:06 kartik@deploy1002: aleksandar and kartik: Continuing with sync
08:03 kartik@deploy1002: aleksandar and kartik: Backport for Add "editautopatrolprotected" and "editpatrolprotected" protection levels on shwiki (T344306) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
08:01 kartik@deploy1002: Started scap: Backport for Add "editautopatrolprotected" and "editpatrolprotected" protection levels on shwiki (T344306)
07:56 kartik@deploy1002: Finished scap: Backport for Enable AbuseFilter blocks on shwiki (T345513) (duration: 19m 29s)
07:46 moritzm: depool mw2448 (unreachable)
07:45 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS bullseye
07:42 kartik@deploy1002: kartik and aleksandar: Continuing with sync
07:38 kartik@deploy1002: kartik and aleksandar: Backport for Enable AbuseFilter blocks on shwiki (T345513) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:36 kartik@deploy1002: Started scap: Backport for Enable AbuseFilter blocks on shwiki (T345513)
07:32 kartik@deploy1002: Finished scap: Backport for Enable Section and Content Translation in 7 Wikipedias (T343211) (duration: 15m 45s)
07:23 moritzm: failover ganeti masters in esams to ganeti3007/ganeti3008
07:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
07:20 kartik@deploy1002: kartik: Continuing with sync
07:18 kartik@deploy1002: kartik: Backport for Enable Section and Content Translation in 7 Wikipedias (T343211) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:16 kartik@deploy1002: Started scap: Backport for Enable Section and Content Translation in 7 Wikipedias (T343211)
07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
07:08 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1131.eqiad.wmnet with OS bullseye
07:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
07:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
07:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
06:59 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS bullseye
06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
06:51 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1130.eqiad.wmnet with OS bullseye
06:49 tstarling@deploy1002: Synchronized wmf-config/CommonSettings-labs.php: Labs only change, just avoiding undeployed changes (duration: 09m 25s)
06:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1131.eqiad.wmnet with reason: host reimage
06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
06:43 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1131.eqiad.wmnet with reason: host reimage
06:29 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1131.eqiad.wmnet with OS bullseye
06:26 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1130.eqiad.wmnet with reason: host reimage
06:24 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1130.eqiad.wmnet with reason: host reimage
06:10 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1130.eqiad.wmnet with OS bullseye
06:06 kart_: Updated cxserver to 2023-08-29-191442-production (T345170, T343450)
06:04 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:04 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:01 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:58 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:57 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:55 kart_: Updated MinT to 2023-09-04-051105-production (T336683)
05:46 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
05:41 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
05:36 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
05:30 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
05:25 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
05:22 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
03:59 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.25 refs T343727 (duration: 56m 29s)
03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.25 refs T343727

2023-09-04

16:14 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
16:07 topranks: setting port 1/1/5 to speed 100G on cr2-codfw
16:06 topranks: setting port 1/1/5 to speed 100G on cr1-codfw
16:05 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
15:46 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 14s)
15:40 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 01s)
14:57 moritzm: installing json-c security updates
14:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
14:47 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
14:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
14:44 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
14:31 godog: bounce prometheus@k8s-aux
14:29 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:58 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:58 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:56 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:55 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:54 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
13:52 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:50 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:50 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
13:50 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:49 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
13:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
13:48 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
13:48 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
13:48 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:46 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:45 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
13:41 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw cr<-> ssw links. - cmooney@cumin1001"
13:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:15 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1002.eqiad.wmnet with OS bullseye
12:46 hnowlan: staggered restarting restbase service on A:restbase
12:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 149665
12:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 149665
12:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 138884
12:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 138884
12:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136065
12:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136065
12:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 27381
12:18 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host netbox1002.eqiad.wmnet
12:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox1002.eqiad.wmnet
12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox2002.codfw.wmnet
12:03 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 27381
12:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox2002.codfw.wmnet
11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
11:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
11:53 hnowlan@deploy1002: Finished deploy [restbase/deploy@26bc1a5]: Add new wikis T343543 T343549 T345171 (duration: 14m 32s)
11:51 moritzm: installing grub2 updates from Bullseye point release
11:51 moritzm: installing grub2 updates from Bullseye point relese
11:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1002.eqiad.wmnet with reason: host reimage
11:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1002.eqiad.wmnet with reason: host reimage
11:38 hnowlan@deploy1002: Started deploy [restbase/deploy@26bc1a5]: Add new wikis T343543 T343549 T345171
11:35 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1002.eqiad.wmnet with OS bullseye
11:08 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: " - jbond@cumin1001 - T342534"
11:08 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: " - jbond@cumin1001 - T342534"
11:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
11:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
10:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
10:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
10:29 jbond: enable-puppet fleet wide post "deploy confd change gerrit:954007"
10:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
10:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
10:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
10:13 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
09:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
09:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
09:49 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
09:49 akosiaris: T345290. Update mathoid to 2023-05-13-192519-production
09:48 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
09:48 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
09:48 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
09:47 jbond: disable-puppet fleet wide "deploy confd change gerrit:954007"
09:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices1006 - aborrero@cumin1001"
09:45 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Add CP secret (duration: 15m 47s)
09:44 aborrero@cumin1001: START - Cookbook sre.dns.netbox
09:43 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
09:43 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
09:42 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1129.eqiad.wmnet with OS bullseye
09:42 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
09:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
09:39 ladsgroup@deploy1002: ladsgroup: Continuing with sync
09:38 ladsgroup@deploy1002: ladsgroup: Add CP secret synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
09:34 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
09:34 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
09:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
09:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
09:29 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on an-worker1129.eqiad.wmnet with reason: host reimage
09:29 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
09:28 akosiaris: deploying mathoid to bump service mesh envoy version to 1.23.10-2-s2. No changes to the app.
09:27 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1129.eqiad.wmnet with reason: host reimage
09:27 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
09:27 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
09:14 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1129.eqiad.wmnet with OS bullseye
09:13 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
09:10 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1003.eqiad.wmnet
09:09 elukey: rename "ens5" to "ens13" on orespoolcounter1003's /etc/network/interfaces after a VM reboot
09:04 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1003.eqiad.wmnet
09:04 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter1004.eqiad.wmnet
08:57 elukey: rename "ens5" to "ens13" on orespoolcounter1004's /etc/network/interfaces after a VM reboot
08:56 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
08:51 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
08:51 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2003.codfw.wmnet
08:46 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter1004.eqiad.wmnet
08:46 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry2003.codfw.wmnet
08:45 jayme@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
08:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast6002.wikimedia.org
08:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast6002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:43 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
08:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2003.codfw.wmnet
08:41 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:39 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-eqiad
08:38 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
08:38 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1003.eqiad.wmnet
08:37 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
08:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast6002.wikimedia.org
08:34 elukey: rename "ens5" to "ens13" on orespoolcounter2003's /etc/network/interfaces after a VM reboot
08:33 jayme@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-staging-worker-eqiad
08:33 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host registry1003.eqiad.wmnet
08:31 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast5003.wikimedia.org
08:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:28 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubestagemaster2002.codfw.wmnet
08:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast5003.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:25 jayme@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
08:23 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet
08:23 jayme@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kubestagemaster1002.eqiad.wmnet
08:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:19 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
08:18 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
08:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast5003.wikimedia.org
08:18 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
08:17 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast4004.wikimedia.org
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:15 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2002.codfw.wmnet
08:15 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4004.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:15 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2001.codfw.wmnet
08:14 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2003.codfw.wmnet
08:14 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
08:14 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
08:14 jayme@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
08:14 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
08:13 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
08:13 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:11 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1002.eqiad.wmnet
08:10 jayme@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster1001.eqiad.wmnet
08:10 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
08:09 jayme@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
08:08 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4004.wikimedia.org
08:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster1001.eqiad.wmnet
08:03 jayme@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
08:00 elukey: restart kubelet on ml-serve1002 to check if stale prometheus metrics are the cause of the stop_container alert
08:00 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM orespoolcounter2004.codfw.wmnet
07:59 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM orespoolcounter2004.codfw.wmnet
07:35 Emperor: restart tcpircbot-logmsgbot on alert1001
07:22 moritzm: failover ganeti masters in drmrs to ganeti6001/ganeti6002
06:12 XioNoX: push new pfw policies - T345288

2023-09-02

15:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1128.eqiad.wmnet with reason: depooled after replica lag page, two days
15:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1128.eqiad.wmnet with reason: depooled after replica lag page, two days
15:49 sukhe@cumin2002: dbctl commit (dc=all): 'Depool db1128', diff saved to https://phabricator.wikimedia.org/P52244 and previous config saved to /var/cache/conftool/dbconfig/20230902-154903-sukhe.json
05:45 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1003.eqiad.wmnet
05:39 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1003.eqiad.wmnet
05:38 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host snapshot1008.eqiad.wmnet
05:32 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host snapshot1008.eqiad.wmnet
00:06 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:06 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay IRBs. - cmooney@cumin1001"
00:05 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay IRBs. - cmooney@cumin1001"
00:02 cmooney@cumin1001: START - Cookbook sre.dns.netbox

2023-09-01

23:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:55 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay loopbacks. - cmooney@cumin1001"
23:54 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for codfw spine switch overlay loopbacks. - cmooney@cumin1001"
23:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
23:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
22:46 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-a4-codfw.mgmt.codfw.wmnet
22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-a5-codfw.mgmt.codfw.wmnet
22:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2027.codfw.wmnet with OS bullseye
22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-a8-codfw.mgmt.codfw.wmnet
22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-b4-codfw.mgmt.codfw.wmnet
22:45 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
22:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
22:45 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
22:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device lsw1-b5-codfw.mgmt.codfw.wmnet
22:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
22:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
22:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh4002.wikimedia.org
22:22 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh4002.wikimedia.org
22:02 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh4002.wikimedia.org with OS bookworm
21:57 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw
21:57 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b8-codfw
21:57 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw
21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b7-codfw
21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw
21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b6-codfw
21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw
21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b5-codfw
21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw
21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b4-codfw
21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw
21:56 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b3-codfw
21:56 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw
21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-b2-codfw
21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw
21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a8-codfw
21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw
21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a7-codfw
21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw
21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a6-codfw
21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw
21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a5-codfw
21:55 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw
21:55 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a4-codfw
21:54 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw
21:54 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a3-codfw
21:54 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw
21:54 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a2-codfw
21:54 cmooney@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a1-codfw
21:54 cmooney@cumin1001: START - Cookbook sre.network.tls for network device lsw1-a1-codfw
21:52 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
21:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
21:40 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b8-codfw.mgmt.codfw.wmnet
21:36 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b7-codfw.mgmt.codfw.wmnet
21:32 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b6-codfw.mgmt.codfw.wmnet
21:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh4002.wikimedia.org with OS bookworm
21:29 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b5-codfw.mgmt.codfw.wmnet
21:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
21:18 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
21:11 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
21:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
21:08 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:08 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b8-codfw - cmooney@cumin1001"
21:08 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b8-codfw - cmooney@cumin1001"
21:05 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:05 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b8-codfw.mgmt.codfw.wmnet
21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b7-codfw - cmooney@cumin1001"
21:04 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b7-codfw - cmooney@cumin1001"
21:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
21:01 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b7-codfw.mgmt.codfw.wmnet
21:01 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:01 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b6-codfw - cmooney@cumin1001"
21:00 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b6-codfw - cmooney@cumin1001"
20:58 cmooney@cumin1001: START - Cookbook sre.dns.netbox
20:58 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b6-codfw.mgmt.codfw.wmnet
20:57 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:57 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b5-codfw - cmooney@cumin1001"
20:56 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b5-codfw - cmooney@cumin1001"
20:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
20:26 robh@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 173
20:25 robh@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 173
20:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh3003.wikimedia.org
20:11 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh3003.wikimedia.org
20:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
20:04 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b5-codfw.mgmt.codfw.wmnet
20:03 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b4-codfw.mgmt.codfw.wmnet
20:00 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
19:59 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
19:56 cmooney@cumin1001: START - Cookbook sre.dns.netbox
19:56 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:56 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
19:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new switch links codfw - cmooney@cumin1001"
19:48 cmooney@cumin1001: START - Cookbook sre.dns.netbox
19:47 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:32 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b4-codfw - cmooney@cumin1001"
19:30 bblack@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
19:30 bblack@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
19:28 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b4-codfw - cmooney@cumin1001"
19:26 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
19:23 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh3003.wikimedia.org with OS bookworm
19:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2026.codfw.wmnet with OS bullseye
19:20 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2025.codfw.wmnet with OS bullseye
19:12 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
19:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3003.wikimedia.org with reason: host reimage
19:11 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2027.codfw.wmnet with OS bullseye
19:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
19:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
19:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
18:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2029.codfw.wmnet with OS bullseye
18:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2028.codfw.wmnet with OS bullseye
18:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
18:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2029.codfw.wmnet with reason: host reimage
18:48 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2027.codfw.wmnet with OS bullseye
18:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh3003.wikimedia.org with OS bookworm
18:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
18:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2028.codfw.wmnet with reason: host reimage
18:42 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
18:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2027.codfw.wmnet with reason: host reimage
18:39 cmooney@cumin1001: START - Cookbook sre.dns.netbox
18:39 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b4-codfw.mgmt.codfw.wmnet
18:35 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release
18:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2029.codfw.wmnet with OS bullseye
18:30 ryankemper@cumin1001: START - Cookbook sre.wdqs.reboot
18:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2028.codfw.wmnet with OS bullseye
18:22 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices1006.eqiad.wmnet with OS bullseye
18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS bullseye
18:21 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b3-codfw.mgmt.codfw.wmnet
18:19 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2027.codfw.wmnet with OS bullseye
18:16 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh3004.wikimedia.org
18:16 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh3004.wikimedia.org
18:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2026.codfw.wmnet with OS bullseye
18:04 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh3004.wikimedia.org with OS bookworm
17:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2025.codfw.wmnet with OS bullseye
17:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
17:53 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
17:53 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
17:53 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3004.wikimedia.org with reason: host reimage
17:49 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:49 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b3-codfw - cmooney@cumin1001"
17:48 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b3-codfw - cmooney@cumin1001"
17:46 cmooney@cumin1001: START - Cookbook sre.dns.netbox
17:46 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b3-codfw.mgmt.codfw.wmnet
17:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh3004.wikimedia.org with OS bookworm
17:30 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b2-codfw.mgmt.codfw.wmnet
17:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh5001.wikimedia.org
17:19 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for doh5001.wikimedia.org
17:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2036']
17:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2036']
17:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['kubernetes2036']
17:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubernetes2036']
17:13 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:13 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new spine links. - cmooney@cumin1001"
17:11 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new spine links. - cmooney@cumin1001"
17:11 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release
17:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox
17:06 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host doh5001.wikimedia.org with OS bookworm
16:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
16:59 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
16:58 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:58 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b2-codfw - cmooney@cumin1001"
16:58 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b2-codfw - cmooney@cumin1001"
16:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
16:55 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-b2-codfw.mgmt.codfw.wmnet
16:54 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: apply security updates - bking@cumin1001 - T344587
16:53 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
16:53 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
16:50 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
16:50 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
16:50 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
16:49 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase1030.eqiad.wmnet']
16:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2036.codfw.wmnet with OS bullseye
16:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
16:22 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
16:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2036.codfw.wmnet with OS bullseye
16:21 aokoth@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release
16:19 pmiazga: T343983 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=amwiki --logwiki=metawiki Jean-Mahmood User92259453
16:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2037.codfw.wmnet with OS bullseye
15:57 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk2001.codfw.wmnet
15:57 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk2001.codfw.wmnet with OS bookworm
15:55 brett@cumin2002: START - Cookbook sre.hosts.reimage for host doh5001.wikimedia.org with OS bookworm
15:43 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a8-codfw.mgmt.codfw.wmnet
15:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
15:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
15:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
15:11 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:11 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a8-codfw - cmooney@cumin1001"
15:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
15:05 aokoth@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release
14:59 bking@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply security updates - bking@cumin1001 - T344587
14:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a8-codfw - cmooney@cumin1001"
14:49 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:49 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a8-codfw.mgmt.codfw.wmnet
14:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk2001.codfw.wmnet with OS bookworm
14:40 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
14:39 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk2001.codfw.wmnet - bking@cumin1001"
14:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes2036.codfw.wmnet with OS bullseye
14:39 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk2001.codfw.wmnet on all recursors
14:38 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk2001.codfw.wmnet on all recursors
14:38 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:38 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
14:34 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:33 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
14:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes2036.codfw.wmnet with reason: host reimage
14:31 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:30 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk2001.codfw.wmnet - bking@cumin1001"
14:30 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:29 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: sync
14:29 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
14:28 bking@cumin1001: START - Cookbook sre.dns.netbox
14:28 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk2001.codfw.wmnet
14:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
14:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:23 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: sync
14:23 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
14:23 lsobanski@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security release
14:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:21 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: sync
14:21 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: sync
14:21 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:20 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
14:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
14:17 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
14:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
14:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
14:12 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubernetes2036.codfw.wmnet with OS bullseye
14:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
14:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
13:53 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
13:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
13:50 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:49 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
13:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
13:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
13:40 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
13:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
13:40 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
13:39 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
13:33 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
13:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
13:33 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
13:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
13:31 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:31 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:26 bking@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply security updates - bking@cumin1001 - T344587
13:25 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
13:25 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
13:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
13:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
13:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
13:24 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
13:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
13:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
13:19 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
13:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
13:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
13:16 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
13:16 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
13:10 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a7-codfw.mgmt.codfw.wmnet
13:07 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
13:00 lsobanski@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security release
12:58 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
12:58 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
12:55 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
12:54 hashar: Build /releng/operations-puppet:0.9.0 image and now updated the CI Job operations-puppet-tests-buster-docker to use tox 4.8.0 # T345152
12:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
12:51 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:51 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
12:50 aborrero@cumin1001: START - Cookbook sre.dns.netbox
12:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
12:44 hashar: Updated CI Job operations-puppet-tests-buster-docker to use tox 4.8.0 # T345152
12:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
12:40 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
12:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:39 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a7-codfw - cmooney@cumin1001"
12:38 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a7-codfw - cmooney@cumin1001"
12:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
12:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
12:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
12:32 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
12:31 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
12:31 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
12:31 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
12:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
12:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
12:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
12:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
12:24 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
12:24 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
12:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
12:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
12:23 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
12:23 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
12:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
12:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
12:07 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
12:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
12:03 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
12:03 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
12:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
12:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
11:55 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
11:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
11:53 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
11:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
11:47 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
11:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
11:44 cmooney@cumin1001: START - Cookbook sre.dns.netbox
11:44 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a7-codfw.mgmt.codfw.wmnet
11:08 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
11:02 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a6-codfw.mgmt.codfw.wmnet
11:02 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
11:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
11:01 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:01 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
11:00 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:59 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
10:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:57 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:56 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:56 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:55 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
10:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
10:53 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:53 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:51 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:51 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:46 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
10:44 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
10:43 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:42 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
10:42 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
10:35 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
10:35 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:34 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:34 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:33 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
10:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
10:32 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a6-codfw - cmooney@cumin1001"
10:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a6-codfw - cmooney@cumin1001"
10:28 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-cluster
10:25 cmooney@cumin1001: START - Cookbook sre.dns.netbox
10:25 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a6-codfw.mgmt.codfw.wmnet
10:24 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
10:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
10:24 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug2002.codfw.wmnet
10:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
10:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
10:19 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwdebug2002.codfw.wmnet
10:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwdebug2001.codfw.wmnet
10:14 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host mwdebug2001.codfw.wmnet
10:13 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
10:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
10:12 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
10:12 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:11 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
10:10 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a5-codfw.mgmt.codfw.wmnet
10:07 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:05 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
10:05 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
10:04 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:03 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
10:03 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:03 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
10:03 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:02 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
09:40 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
09:39 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
09:38 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:38 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a5-codfw - cmooney@cumin1001"
09:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2002.codfw.wmnet
09:38 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a5-codfw - cmooney@cumin1001"
09:37 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
09:36 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
09:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
09:35 cmooney@cumin1001: START - Cookbook sre.dns.netbox
09:35 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a5-codfw.mgmt.codfw.wmnet
09:35 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
09:34 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
09:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1003.eqiad.wmnet
09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1003.eqiad.wmnet
09:14 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
09:14 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
09:08 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a4-codfw.mgmt.codfw.wmnet
09:04 claime: Running puppet on 'A:cp-text and P{P:trafficserver::backend}' - T341780
09:02 claime: Push 4% of global traffic to mw-on-k8s - T341780
09:00 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
08:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
08:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2003.codfw.wmnet
08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2003.codfw.wmnet
08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on testreduce1002.eqiad.wmnet with reason: WIP
08:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on testreduce1002.eqiad.wmnet with reason: WIP
08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
08:40 claime: Raised mw-web and mw-api-ext capacity by ~30% - T341780
08:39 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
08:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
08:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
08:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
08:37 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:37 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a4-codfw - cmooney@cumin1001"
08:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
08:36 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a4-codfw - cmooney@cumin1001"
08:36 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
08:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
08:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
08:34 cmooney@cumin1001: START - Cookbook sre.dns.netbox
08:34 cmooney@cumin1001: START - Cookbook sre.network.provision for device lsw1-a4-codfw.mgmt.codfw.wmnet
08:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
08:30 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a3-codfw.mgmt.codfw.wmnet
08:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
08:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
07:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
07:58 cmooney@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a3-codfw - cmooney@cumin1001"
07:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 173
07:34 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 173
07:16 moritzm: failover Ganeti master in eqsin to ganeti5004
07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
07:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
07:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
07:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
07:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
06:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
06:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
06:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
06:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
06:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
06:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
06:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
06:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
06:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
05:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
05:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
05:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
05:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
05:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
00:43 tstarling@deploy1002: Synchronized php-1.41.0-wmf.24/extensions/LoginNotify/includes/Hooks.php: fix production error T345373 (duration: 06m 13s)
00:09 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
00:03 tstarling@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Raise LoginNotify minimum log level to info T174200 (duration: 06m 51s)

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s