Server Admin Log/Archive 93
Appearance
2025-05-30
- 22:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
- 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:08 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cirrussearch[1055-1059].eqiad.wmnet
- 21:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:08 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[1055-1059].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 21:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:08 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[1055-1059].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 21:04 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:49 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cirrussearch[1055-1059].eqiad.wmnet
- 20:48 bd808@deploy1003: Finished scap sync-world: Backport for ext.wikimediaEvents: Soft-depend on MetricsPlatform (T395684 T395494), Revert "JCCache: Use WANObjectCache::getWithSetCallback() instead of set/get" (T395368) (duration: 09m 59s)
- 20:41 bd808@deploy1003: bd808, bvibber: Continuing with sync
- 20:40 bd808@deploy1003: bd808, bvibber: Backport for ext.wikimediaEvents: Soft-depend on MetricsPlatform (T395684 T395494), Revert "JCCache: Use WANObjectCache::getWithSetCallback() instead of set/get" (T395368) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:38 bd808@deploy1003: Started scap sync-world: Backport for ext.wikimediaEvents: Soft-depend on MetricsPlatform (T395684 T395494), Revert "JCCache: Use WANObjectCache::getWithSetCallback() instead of set/get" (T395368)
- 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2048.codfw.wmnet with OS bookworm
- 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2047.codfw.wmnet with OS bookworm
- 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling ms3 (T395241)', diff saved to https://phabricator.wikimedia.org/P76777 and previous config saved to /var/cache/conftool/dbconfig/20250530-200835-root.json
- 20:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts relforge[1003-1004].eqiad.wmnet
- 20:00 bking@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 20:00 bking@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 19:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2048.codfw.wmnet with reason: host reimage
- 19:55 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2143.codfw.wmnet with reason: Maintenance
- 19:55 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1153.eqiad.wmnet with reason: Maintenance
- 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2047.codfw.wmnet with reason: host reimage
- 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling ms3 (T395241)', diff saved to https://phabricator.wikimedia.org/P76776 and previous config saved to /var/cache/conftool/dbconfig/20250530-195436-root.json
- 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling ms2 (T395241)', diff saved to https://phabricator.wikimedia.org/P76775 and previous config saved to /var/cache/conftool/dbconfig/20250530-195427-root.json
- 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2048.codfw.wmnet with reason: host reimage
- 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2047.codfw.wmnet with reason: host reimage
- 19:51 bking@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 19:51 bking@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 19:45 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts relforge[1003-1004].eqiad.wmnet
- 19:41 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2144.codfw.wmnet with reason: Maintenance
- 19:41 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1151.eqiad.wmnet with reason: Maintenance
- 19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling ms2 (T395241)', diff saved to https://phabricator.wikimedia.org/P76774 and previous config saved to /var/cache/conftool/dbconfig/20250530-193951-root.json
- 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling ms1 (T395241)', diff saved to https://phabricator.wikimedia.org/P76773 and previous config saved to /var/cache/conftool/dbconfig/20250530-193537-root.json
- 19:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2048.codfw.wmnet with OS bookworm
- 19:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2047.codfw.wmnet with OS bookworm
- 19:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2142.codfw.wmnet with reason: Maintenance
- 19:24 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1152.eqiad.wmnet with reason: Maintenance
- 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling ms1 (T395241)', diff saved to https://phabricator.wikimedia.org/P76772 and previous config saved to /var/cache/conftool/dbconfig/20250530-192329-root.json
- 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling pc8 (T395241)', diff saved to https://phabricator.wikimedia.org/P76771 and previous config saved to /var/cache/conftool/dbconfig/20250530-192320-root.json
- 19:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2018.codfw.wmnet with reason: Maintenance
- 19:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1018.eqiad.wmnet with reason: Maintenance
- 19:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling pc8 (T395241)', diff saved to https://phabricator.wikimedia.org/P76770 and previous config saved to /var/cache/conftool/dbconfig/20250530-191105-root.json
- 19:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling pc7 (T395241)', diff saved to https://phabricator.wikimedia.org/P76769 and previous config saved to /var/cache/conftool/dbconfig/20250530-191057-root.json
- 19:00 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2017.codfw.wmnet with reason: Maintenance
- 18:59 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: Maintenance
- 18:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling pc7 (T395241)', diff saved to https://phabricator.wikimedia.org/P76768 and previous config saved to /var/cache/conftool/dbconfig/20250530-185832-root.json
- 18:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling pc6 (T395241)', diff saved to https://phabricator.wikimedia.org/P76767 and previous config saved to /var/cache/conftool/dbconfig/20250530-185823-root.json
- 18:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:47 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2016.codfw.wmnet with reason: Maintenance
- 18:47 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1016.eqiad.wmnet with reason: Maintenance
- 18:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling pc6 (T395241)', diff saved to https://phabricator.wikimedia.org/P76766 and previous config saved to /var/cache/conftool/dbconfig/20250530-184605-root.json
- 18:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling pc5 (T395241)', diff saved to https://phabricator.wikimedia.org/P76765 and previous config saved to /var/cache/conftool/dbconfig/20250530-184552-root.json
- 18:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2015.codfw.wmnet with reason: Maintenance
- 18:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1015.eqiad.wmnet with reason: Maintenance
- 18:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling pc5 (T395241)', diff saved to https://phabricator.wikimedia.org/P76764 and previous config saved to /var/cache/conftool/dbconfig/20250530-183319-root.json
- 18:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling pc4 (T395241)', diff saved to https://phabricator.wikimedia.org/P76763 and previous config saved to /var/cache/conftool/dbconfig/20250530-183310-root.json
- 18:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2014.codfw.wmnet with reason: Maintenance
- 18:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1014.eqiad.wmnet with reason: Maintenance
- 18:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling pc4 (T395241)', diff saved to https://phabricator.wikimedia.org/P76761 and previous config saved to /var/cache/conftool/dbconfig/20250530-181855-root.json
- 18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling pc3 (T395241)', diff saved to https://phabricator.wikimedia.org/P76760 and previous config saved to /var/cache/conftool/dbconfig/20250530-181616-root.json
- 18:13 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@cd72f3e]: bump section topics to v1.2.0 and SEAL to v0.6.0 (duration: 01m 33s)
- 18:12 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@cd72f3e]: bump section topics to v1.2.0 and SEAL to v0.6.0
- 18:08 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 18:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 18:03 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2013.codfw.wmnet with reason: Maintenance
- 18:02 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1013.eqiad.wmnet with reason: Maintenance
- 18:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling pc3 (T395241)', diff saved to https://phabricator.wikimedia.org/P76759 and previous config saved to /var/cache/conftool/dbconfig/20250530-180140-root.json
- 18:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling pc2 (T395241)', diff saved to https://phabricator.wikimedia.org/P76758 and previous config saved to /var/cache/conftool/dbconfig/20250530-180131-root.json
- 17:48 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2012.codfw.wmnet with reason: Maintenance
- 17:48 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1012.eqiad.wmnet with reason: Maintenance
- 17:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling pc2 (T395241)', diff saved to https://phabricator.wikimedia.org/P76757 and previous config saved to /var/cache/conftool/dbconfig/20250530-174652-root.json
- 17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling pc1 (T395241)', diff saved to https://phabricator.wikimedia.org/P76756 and previous config saved to /var/cache/conftool/dbconfig/20250530-174510-root.json
- 17:39 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2011.codfw.wmnet with reason: Maintenance
- 17:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:32 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1011.eqiad.wmnet with reason: Maintenance
- 17:32 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:32 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for link from cr1-codfw to ssw1-e1-codfw - cmooney@cumin1002"
- 17:32 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for link from cr1-codfw to ssw1-e1-codfw - cmooney@cumin1002"
- 17:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling pc1 (T395241)', diff saved to https://phabricator.wikimedia.org/P76755 and previous config saved to /var/cache/conftool/dbconfig/20250530-173132-root.json
- 17:27 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 17:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T395241)', diff saved to https://phabricator.wikimedia.org/P76754 and previous config saved to /var/cache/conftool/dbconfig/20250530-164423-fceratto.json
- 16:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2187 gradually with 4 steps - Pooling in after reimage
- 16:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P76752 and previous config saved to /var/cache/conftool/dbconfig/20250530-162915-fceratto.json
- 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:28 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P76750 and previous config saved to /var/cache/conftool/dbconfig/20250530-161408-fceratto.json
- 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T395241)', diff saved to https://phabricator.wikimedia.org/P76748 and previous config saved to /var/cache/conftool/dbconfig/20250530-155900-fceratto.json
- 15:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T395241)', diff saved to https://phabricator.wikimedia.org/P76747 and previous config saved to /var/cache/conftool/dbconfig/20250530-155152-fceratto.json
- 15:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2238.codfw.wmnet with reason: Maintenance
- 15:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T395241)', diff saved to https://phabricator.wikimedia.org/P76746 and previous config saved to /var/cache/conftool/dbconfig/20250530-155125-fceratto.json
- 15:49 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2187 gradually with 4 steps - Pooling in after reimage
- 15:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P76744 and previous config saved to /var/cache/conftool/dbconfig/20250530-153618-fceratto.json
- 15:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P76743 and previous config saved to /var/cache/conftool/dbconfig/20250530-152111-fceratto.json
- 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T395241)', diff saved to https://phabricator.wikimedia.org/P76742 and previous config saved to /var/cache/conftool/dbconfig/20250530-150603-fceratto.json
- 14:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2226 (T395241)', diff saved to https://phabricator.wikimedia.org/P76741 and previous config saved to /var/cache/conftool/dbconfig/20250530-145901-fceratto.json
- 14:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2226.codfw.wmnet with reason: Maintenance
- 14:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T395241)', diff saved to https://phabricator.wikimedia.org/P76740 and previous config saved to /var/cache/conftool/dbconfig/20250530-145835-fceratto.json
- 14:57 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:57 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:44 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1163-1165].eqiad.wmnet with reason: hard drive replacement in progress
- 14:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P76739 and previous config saved to /var/cache/conftool/dbconfig/20250530-144329-fceratto.json
- 14:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P76738 and previous config saved to /var/cache/conftool/dbconfig/20250530-142821-fceratto.json
- 14:28 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2004.codfw.wmnet
- 14:21 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host sretest2004.codfw.wmnet
- 14:20 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet
- 14:15 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet
- 14:15 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
- 14:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T395241)', diff saved to https://phabricator.wikimedia.org/P76737 and previous config saved to /var/cache/conftool/dbconfig/20250530-141314-fceratto.json
- 14:09 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
- 14:07 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2008.wikimedia.org
- 14:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T395241)', diff saved to https://phabricator.wikimedia.org/P76735 and previous config saved to /var/cache/conftool/dbconfig/20250530-140554-fceratto.json
- 14:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2225.codfw.wmnet with reason: Maintenance
- 14:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T395241)', diff saved to https://phabricator.wikimedia.org/P76734 and previous config saved to /var/cache/conftool/dbconfig/20250530-140527-fceratto.json
- 14:04 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host testvm2008.wikimedia.org
- 13:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2007.codfw.wmnet
- 13:55 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host testvm2007.codfw.wmnet
- 13:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P76733 and previous config saved to /var/cache/conftool/dbconfig/20250530-135020-fceratto.json
- 13:46 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2006.codfw.wmnet
- 13:43 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host testvm2006.codfw.wmnet
- 13:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
- 13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P76732 and previous config saved to /var/cache/conftool/dbconfig/20250530-133514-fceratto.json
- 13:31 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
- 13:28 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
- 13:22 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
- 13:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T395241)', diff saved to https://phabricator.wikimedia.org/P76731 and previous config saved to /var/cache/conftool/dbconfig/20250530-132006-fceratto.json
- 13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T395241)', diff saved to https://phabricator.wikimedia.org/P76730 and previous config saved to /var/cache/conftool/dbconfig/20250530-131251-fceratto.json
- 13:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
- 13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T395241)', diff saved to https://phabricator.wikimedia.org/P76729 and previous config saved to /var/cache/conftool/dbconfig/20250530-131223-fceratto.json
- 12:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P76728 and previous config saved to /var/cache/conftool/dbconfig/20250530-125717-fceratto.json
- 12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P76727 and previous config saved to /var/cache/conftool/dbconfig/20250530-124209-fceratto.json
- 12:41 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Arturo Borrero Gonzalez out of all services on: 1437 hosts
- 12:37 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Arturo Borrero Gonzalez out of all services on: 927 hosts
- 12:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T395241)', diff saved to https://phabricator.wikimedia.org/P76725 and previous config saved to /var/cache/conftool/dbconfig/20250530-122701-fceratto.json
- 12:26 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2187.codfw.wmnet with OS bookworm
- 12:21 topranks: removing superfluous 'mode auto' command on codfw dc switches T394530
- 12:20 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T395241)', diff saved to https://phabricator.wikimedia.org/P76724 and previous config saved to /var/cache/conftool/dbconfig/20250530-122001-fceratto.json
- 12:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 12:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T395241)', diff saved to https://phabricator.wikimedia.org/P76723 and previous config saved to /var/cache/conftool/dbconfig/20250530-121934-fceratto.json
- 12:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P76721 and previous config saved to /var/cache/conftool/dbconfig/20250530-120427-fceratto.json
- 12:03 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: host reimage
- 12:01 fceratto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: host reimage
- 11:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P76720 and previous config saved to /var/cache/conftool/dbconfig/20250530-114921-fceratto.json
- 11:41 fceratto@cumin1002: START - Cookbook sre.hosts.reimage for host db2187.codfw.wmnet with OS bookworm
- 11:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T395241)', diff saved to https://phabricator.wikimedia.org/P76718 and previous config saved to /var/cache/conftool/dbconfig/20250530-113414-fceratto.json
- 11:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T395241)', diff saved to https://phabricator.wikimedia.org/P76717 and previous config saved to /var/cache/conftool/dbconfig/20250530-112449-fceratto.json
- 11:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 11:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T395241)', diff saved to https://phabricator.wikimedia.org/P76716 and previous config saved to /var/cache/conftool/dbconfig/20250530-112423-fceratto.json
- 11:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P76715 and previous config saved to /var/cache/conftool/dbconfig/20250530-110917-fceratto.json
- 10:57 JustHannah: T395592 Ran mwscript-k8s --comment="T395592" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Yusuftahaluleci' 'Yusuf_Taha_Lüleci'
- 10:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P76714 and previous config saved to /var/cache/conftool/dbconfig/20250530-105410-fceratto.json
- 10:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T395241)', diff saved to https://phabricator.wikimedia.org/P76713 and previous config saved to /var/cache/conftool/dbconfig/20250530-103903-fceratto.json
- 10:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T395241)', diff saved to https://phabricator.wikimedia.org/P76712 and previous config saved to /var/cache/conftool/dbconfig/20250530-102830-fceratto.json
- 10:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76711 and previous config saved to /var/cache/conftool/dbconfig/20250530-102416-root.json
- 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76710 and previous config saved to /var/cache/conftool/dbconfig/20250530-100911-root.json
- 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76709 and previous config saved to /var/cache/conftool/dbconfig/20250530-095405-root.json
- 09:39 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76708 and previous config saved to /var/cache/conftool/dbconfig/20250530-093859-root.json
- 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76707 and previous config saved to /var/cache/conftool/dbconfig/20250530-092353-root.json
- 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76706 and previous config saved to /var/cache/conftool/dbconfig/20250530-090847-root.json
- 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76705 and previous config saved to /var/cache/conftool/dbconfig/20250530-085341-root.json
- 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2040 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76704 and previous config saved to /var/cache/conftool/dbconfig/20250530-083836-root.json
- 08:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2040.codfw.wmnet with reason: Maintenance
- 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2040 T395647', diff saved to https://phabricator.wikimedia.org/P76703 and previous config saved to /var/cache/conftool/dbconfig/20250530-082144-marostegui.json
- 08:20 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2187.codfw.wmnet with reason: Reimaging
- 08:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depool db2187 for reimaging, see T394884', diff saved to https://phabricator.wikimedia.org/P76702 and previous config saved to /var/cache/conftool/dbconfig/20250530-081804-fceratto.json
- 07:09 elukey@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 07:08 elukey@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 07:08 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 07:05 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=inference,name=eqiad
- 06:29 moritzm: uninstalling systemd-coredump (only installed on one host due to an older tests, but not needed and there's open security issues)
2025-05-29
- 23:34 bvibber@deploy1003: Finished scap sync-world: Backport for Validation fix for saving Data: .chart pages with transforms (T395631) (duration: 10m 00s)
- 23:27 bvibber@deploy1003: bvibber: Continuing with sync
- 23:25 bvibber@deploy1003: bvibber: Backport for Validation fix for saving Data: .chart pages with transforms (T395631) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:23 bvibber@deploy1003: Started scap sync-world: Backport for Validation fix for saving Data: .chart pages with transforms (T395631)
- 21:41 cjming: end of UTC late backport window
- 21:38 cjming@deploy1003: Finished scap sync-world: Backport for ext.wikimediaEvents: Add XLab PageVisit instrument (T393918 T392313) (duration: 09m 54s)
- 21:31 cjming@deploy1003: cjming: Continuing with sync
- 21:30 cjming@deploy1003: cjming: Backport for ext.wikimediaEvents: Add XLab PageVisit instrument (T393918 T392313) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:28 cjming@deploy1003: Started scap sync-world: Backport for ext.wikimediaEvents: Add XLab PageVisit instrument (T393918 T392313)
- 21:20 cjming@deploy1003: Finished scap sync-world: Backport for EventStreamConfig: Remove xLab development streams (T393918) (duration: 09m 34s)
- 21:12 cjming@deploy1003: cjming, phuedx: Continuing with sync
- 21:12 cjming@deploy1003: cjming, phuedx: Backport for EventStreamConfig: Remove xLab development streams (T393918) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:10 cjming@deploy1003: Started scap sync-world: Backport for EventStreamConfig: Remove xLab development streams (T393918)
- 21:07 cjming@deploy1003: Finished scap sync-world: Backport for noc: Fix invalid `max-age: 300` syntax to `max-age=300` in fileserve.php (T341859) (duration: 10m 22s)
- 21:00 cjming@deploy1003: cjming, krinkle: Continuing with sync
- 20:59 cjming@deploy1003: cjming, krinkle: Backport for noc: Fix invalid `max-age: 300` syntax to `max-age=300` in fileserve.php (T341859) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:57 cjming@deploy1003: Started scap sync-world: Backport for noc: Fix invalid `max-age: 300` syntax to `max-age=300` in fileserve.php (T341859)
- 20:55 cscott@deploy1003: Finished scap sync-world: Backport for Campaign: Ensure `` wrapper is removed (T395023) (duration: 15m 01s)
- 20:48 cscott@deploy1003: cscott: Continuing with sync
- 20:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 20:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T395241)', diff saved to https://phabricator.wikimedia.org/P76700 and previous config saved to /var/cache/conftool/dbconfig/20250529-204251-fceratto.json
- 20:42 cscott@deploy1003: cscott: Backport for Campaign: Ensure `` wrapper is removed (T395023) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:40 cscott@deploy1003: Started scap sync-world: Backport for Campaign: Ensure `
- 20:28 cjming@deploy1003: Finished scap sync-world: Backport for Turn on glent m1 AB test (T262612) (duration: 10m 19s)
- 20:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P76699 and previous config saved to /var/cache/conftool/dbconfig/20250529-202743-fceratto.json
- 20:21 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
- 20:20 cjming@deploy1003: ebernhardson, cjming: Backport for Turn on glent m1 AB test (T262612) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:18 cjming@deploy1003: Started scap sync-world: Backport for Turn on glent m1 AB test (T262612)
- 20:16 jdlrobson@deploy1003: Finished scap sync-world: Backport for Enable Minerva typeahead search on beta cluster (T380510), Enable ReadingList special page on test wiki (duration: 10m 13s)
- 20:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P76698 and previous config saved to /var/cache/conftool/dbconfig/20250529-201236-fceratto.json
- 20:08 jdlrobson@deploy1003: jdlrobson: Continuing with sync
- 20:07 jdlrobson@deploy1003: jdlrobson: Backport for Enable Minerva typeahead search on beta cluster (T380510), Enable ReadingList special page on test wiki synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:05 jdlrobson@deploy1003: Started scap sync-world: Backport for Enable Minerva typeahead search on beta cluster (T380510), Enable ReadingList special page on test wiki
- 19:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T395241)', diff saved to https://phabricator.wikimedia.org/P76697 and previous config saved to /var/cache/conftool/dbconfig/20250529-195729-fceratto.json
- 19:05 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on relforge[1003-1004,1008-1009].eqiad.wmnet with reason: noisy alerts
- 18:57 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T395241)', diff saved to https://phabricator.wikimedia.org/P76696 and previous config saved to /var/cache/conftool/dbconfig/20250529-185711-fceratto.json
- 18:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 18:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 18:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T395241)', diff saved to https://phabricator.wikimedia.org/P76695 and previous config saved to /var/cache/conftool/dbconfig/20250529-185639-fceratto.json
- 18:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P76694 and previous config saved to /var/cache/conftool/dbconfig/20250529-184132-fceratto.json
- 18:27 dancy@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.3 refs T392173
- 18:27 dr0ptp4kt@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 18:26 dr0ptp4kt@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 18:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P76693 and previous config saved to /var/cache/conftool/dbconfig/20250529-182624-fceratto.json
- 18:22 dr0ptp4kt@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 18:21 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 18:21 dr0ptp4kt@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 18:21 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 18:12 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 18:12 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 18:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T395241)', diff saved to https://phabricator.wikimedia.org/P76692 and previous config saved to /var/cache/conftool/dbconfig/20250529-181118-fceratto.json
- 18:01 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 18:01 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:53 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:53 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T395241)', diff saved to https://phabricator.wikimedia.org/P76691 and previous config saved to /var/cache/conftool/dbconfig/20250529-174847-fceratto.json
- 17:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 17:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T395241)', diff saved to https://phabricator.wikimedia.org/P76690 and previous config saved to /var/cache/conftool/dbconfig/20250529-174821-fceratto.json
- 17:47 swfrench@deploy1003: Finished scap sync-world: Clear noop helmfile diffs from gerrit change r/1148491 - T378479 (duration: 02m 29s)
- 17:44 swfrench@deploy1003: Started scap sync-world: Clear noop helmfile diffs from gerrit change r/1148491 - T378479
- 17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P76689 and previous config saved to /var/cache/conftool/dbconfig/20250529-173314-fceratto.json
- 17:32 bvibber@deploy1003: Finished scap sync-world: Backport for Fix type error in GlobalJsonLinks processing (T395593) (duration: 10m 04s)
- 17:25 bvibber@deploy1003: bvibber: Continuing with sync
- 17:24 bvibber@deploy1003: bvibber: Backport for Fix type error in GlobalJsonLinks processing (T395593) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:22 bvibber@deploy1003: Started scap sync-world: Backport for Fix type error in GlobalJsonLinks processing (T395593)
- 17:21 volans@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cirrussearch2112.codfw.wmnet
- 17:21 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2112.codfw.wmnet
- 17:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P76688 and previous config saved to /var/cache/conftool/dbconfig/20250529-171807-fceratto.json
- 17:10 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cirrussearch2112.codfw.wmnet
- 17:10 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2112.codfw.wmnet
- 17:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:06 bvibber@deploy1003: Finished scap sync-world: Backport for Enable Lua transform switch for Charts on test and beta (T395516) (duration: 14m 10s)
- 17:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T395241)', diff saved to https://phabricator.wikimedia.org/P76687 and previous config saved to /var/cache/conftool/dbconfig/20250529-170259-fceratto.json
- 16:58 bvibber@deploy1003: bvibber: Continuing with sync
- 16:56 bvibber@deploy1003: bvibber: Backport for Enable Lua transform switch for Charts on test and beta (T395516) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:55 topranks: removing "session-mode automatic" from IBGP config on lsw1-e8-eqiad
- 16:53 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T395241)', diff saved to https://phabricator.wikimedia.org/P76685 and previous config saved to /var/cache/conftool/dbconfig/20250529-165334-fceratto.json
- 16:53 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 16:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T395241)', diff saved to https://phabricator.wikimedia.org/P76684 and previous config saved to /var/cache/conftool/dbconfig/20250529-165320-fceratto.json
- 16:52 bvibber@deploy1003: Started scap sync-world: Backport for Enable Lua transform switch for Charts on test and beta (T395516)
- 16:49 bvibber@deploy1003: Finished scap sync-world: Backport for Lua transform backend for JsonConfig Data: pages (T388434), Chart-side support for Lua transforms (T388616) (duration: 40m 11s)
- 16:42 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:41 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P76683 and previous config saved to /var/cache/conftool/dbconfig/20250529-163812-fceratto.json
- 16:37 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2007.codfw.wmnet on all recursors
- 16:37 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2007.codfw.wmnet on all recursors
- 16:35 bvibber@deploy1003: bvibber: Continuing with sync
- 16:33 bvibber@deploy1003: bvibber: Backport for Lua transform backend for JsonConfig Data: pages (T388434), Chart-side support for Lua transforms (T388616) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:25 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:25 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2007 - cmooney@cumin1002"
- 16:25 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2007 - cmooney@cumin1002"
- 16:25 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cirrussearch[2112-2113].codfw.wmnet with reason: firmware update
- 16:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cirrussearch[2111-2112].codfw.wmnet
- 16:24 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for cirrussearch[2111-2112].codfw.wmnet
- 16:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P76682 and previous config saved to /var/cache/conftool/dbconfig/20250529-162305-fceratto.json
- 16:19 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 16:19 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 16:17 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch2112*,cirrussearch2113* for T394543 - bking@cumin2002
- 16:17 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2007.codfw.wmnet with OS bookworm
- 16:17 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1002"
- 16:17 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch2112*,cirrussearch2113* for T394543 - bking@cumin2002
- 16:17 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1002"
- 16:16 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 16:09 bvibber@deploy1003: Started scap sync-world: Backport for Lua transform backend for JsonConfig Data: pages (T388434), Chart-side support for Lua transforms (T388616)
- 16:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T395241)', diff saved to https://phabricator.wikimedia.org/P76681 and previous config saved to /var/cache/conftool/dbconfig/20250529-160758-fceratto.json
- 15:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T395241)', diff saved to https://phabricator.wikimedia.org/P76680 and previous config saved to /var/cache/conftool/dbconfig/20250529-155843-fceratto.json
- 15:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 15:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76679 and previous config saved to /var/cache/conftool/dbconfig/20250529-155817-fceratto.json
- 15:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2007.codfw.wmnet with reason: host reimage
- 15:56 bvibber@deploy1003: Finished scap sync-world: Backport for Enable Chart for Phase 4 wikis (all remaining public wikis) (T393788) (duration: 10m 20s)
- 15:52 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2007.codfw.wmnet with reason: host reimage
- 15:48 bvibber@deploy1003: jforrester, bvibber: Continuing with sync
- 15:47 bvibber@deploy1003: jforrester, bvibber: Backport for Enable Chart for Phase 4 wikis (all remaining public wikis) (T393788) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:45 bvibber@deploy1003: Started scap sync-world: Backport for Enable Chart for Phase 4 wikis (all remaining public wikis) (T393788)
- 15:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P76678 and previous config saved to /var/cache/conftool/dbconfig/20250529-154309-fceratto.json
- 15:34 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2007.codfw.wmnet with OS bookworm
- 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P76677 and previous config saved to /var/cache/conftool/dbconfig/20250529-152802-fceratto.json
- 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76676 and previous config saved to /var/cache/conftool/dbconfig/20250529-151255-fceratto.json
- 15:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76675 and previous config saved to /var/cache/conftool/dbconfig/20250529-150332-fceratto.json
- 15:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 15:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T395241)', diff saved to https://phabricator.wikimedia.org/P76674 and previous config saved to /var/cache/conftool/dbconfig/20250529-150306-fceratto.json
- 14:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P76673 and previous config saved to /var/cache/conftool/dbconfig/20250529-144759-fceratto.json
- 14:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P76672 and previous config saved to /var/cache/conftool/dbconfig/20250529-143251-fceratto.json
- 14:31 dcausse: T395546: rebuildind eqiad completion indices for cswikiversity pnbwiktionary avkwiki mlwikiquote ladwiki ptwikiquote cswikiversity olowiki azwikibooks wikimania2018wiki rowikibooks rswikimedia liwiktionary oswiki nnwiktionary trwikisource vowikibooks ndswiktionary bmwikiquote ilowiki pawiktionary nowikibooks zhwikiversity tawikiquote hawiktionary akwikibooks udmwiki xhwikibooks
- 14:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T395241)', diff saved to https://phabricator.wikimedia.org/P76671 and previous config saved to /var/cache/conftool/dbconfig/20250529-141744-fceratto.json
- 14:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T395241)', diff saved to https://phabricator.wikimedia.org/P76670 and previous config saved to /var/cache/conftool/dbconfig/20250529-140811-fceratto.json
- 14:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 13:39 marostegui: Reboot dbproxy[1022,1025,1028-1029].eqiad.wmnet T395241
- 13:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy[1022,1025,1028-1029].eqiad.wmnet with reason: Maintenance
- 13:39 taavi@deploy1003: Finished scap sync-world: Backport for Allow itwiki bureaucrat to remove sysop permission (T394752) (duration: 10m 13s)
- 13:32 taavi@deploy1003: simmed, taavi: Continuing with sync
- 13:31 taavi@deploy1003: simmed, taavi: Backport for Allow itwiki bureaucrat to remove sysop permission (T394752) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:28 taavi@deploy1003: Started scap sync-world: Backport for Allow itwiki bureaucrat to remove sysop permission (T394752)
- 13:26 taavi@deploy1003: Finished scap sync-world: Backport for Fixes issues with recommendations config in production (T393943) (duration: 12m 09s)
- 13:19 taavi@deploy1003: jdlrobson, taavi: Continuing with sync
- 13:16 taavi@deploy1003: jdlrobson, taavi: Backport for Fixes issues with recommendations config in production (T393943) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:14 taavi@deploy1003: Started scap sync-world: Backport for Fixes issues with recommendations config in production (T393943)
- 12:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:52 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:52 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:51 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:51 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:51 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:51 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:45 dcausse: T395546: restoring pnbwiktionary avkwiki ladwiki ptwikiquote cswikiversity azwikibooks wikimania2018wiki liwiktionary nnwiktionary vowikibooks bmwikiquote nowikibooks zhwikiversity akwikibooks general indices in omega@eqiad from omega@codfw
- 12:38 dcausse: T395546 (errata from previous log: s/psi/omega/) restoring collabwiki iewikibooks cywikiquote fawikibooks svwikiquote biwiktionary afwiktionary kywiktionary trwikisource mnwiktionary wikimania2007wiki sgwiki content indices in omega@eqiad from omega@codfw
- 12:38 marostegui: Deploy schema change on s3 eqiad dbmaint with replication T395335
- 12:37 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:37 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for cagefive2001 test server - cmooney@cumin1002"
- 12:37 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for cagefive2001 test server - cmooney@cumin1002"
- 12:36 elukey@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 12:36 elukey@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 12:36 marostegui: Deploy schema change on s7 eqiad dbmaint with replication T395335
- 12:36 dcausse: T395546 restoring collabwiki iewikibooks cywikiquote fawikibooks svwikiquote biwiktionary afwiktionary kywiktionary trwikisource mnwiktionary wikimania2007wiki sgwiki content indices in psi@eqiad from psi@codfw
- 12:34 dcausse: T395546 restoring xhwikibooks_general in psi@eqiad from psi@codfw
- 12:33 dcausse: T395546 restoring bowikibooks_content in psi@eqiad from psi@codfw
- 12:32 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 12:30 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 12:29 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 12:29 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 12:29 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 12:28 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 12:28 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 12:28 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 12:27 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 12:27 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 12:27 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:26 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 12:26 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 12:26 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 12:25 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:25 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 12:24 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 12:24 elukey@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 12:23 marostegui: Deploy schema change on s1 eqiad dbmaint with replication T395335
- 12:22 marostegui: Deploy schema change on s4 eqiad dbmaint with replication T395335
- 12:22 marostegui: Deploy schema change on s8 eqiad dbmaint with replication T395335
- 12:18 marostegui: Deploy schema change on s5 eqiad dbmaint with replication T395335
- 12:17 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 12:16 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 12:16 marostegui: Deploy schema change on s2 eqiad dbmaint with replication T395335
- 12:15 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 12:15 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 12:15 marostegui: Deploy schema change on s6 eqiad dbmaint with replication T395335
- 12:15 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 12:14 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for an-redacteddb1001.eqiad.wmnet
- 12:14 btullis@cumin1002: START - Cookbook sre.hosts.remove-downtime for an-redacteddb1001.eqiad.wmnet
- 12:14 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 12:14 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 12:13 dr0ptp4kt@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 12:13 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 12:12 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 12:11 dr0ptp4kt@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 12:09 dcausse: T395546: populating archive indices in eqiad for bat_smgwiki bmwiktionary crwiktionary cswikinews gorwiktionary hewiktionary iewikibooks kgwiki kowikibooks lldwiki niawiki nowikimedia plwikibooks quwikibooks sahwikisource sswiki vecwikisource wikimania2014wiki
- 12:08 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:06 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 12:05 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 12:04 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 12:03 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 11:56 dr0ptp4kt@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 11:55 dr0ptp4kt@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 11:52 dcausse: T395546: creating empty archive indices in eqiad for bat_smgwiki bmwiktionary crwiktionary cswikinews gorwiktionary hewiktionary iewikibooks kgwiki kowikibooks lldwiki niawiki nowikimedia plwikibooks quwikibooks sahwikisource sswiki vecwikisource wikimania2014wiki
- 11:50 marostegui@dns1006: END - running authdns-update
- 11:50 btullis@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Upgrading MariaDB to 10.11
- 11:50 marostegui: Failover m5-master T395241
- 11:50 marostegui@dns1006: START - running authdns-update
- 11:38 marostegui@dns1006: END - running authdns-update
- 11:38 marostegui: Failover m3-master T395241
- 11:37 marostegui@dns1006: START - running authdns-update
- 11:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2039 gradually with 4 steps - Ready
- 11:27 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@6b26a59]: T393560 (duration: 01m 12s)
- 11:26 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@6b26a59]: T393560
- 11:02 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es7" (duration: 09m 58s)
- 11:01 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for es2039.codfw.wmnet
- 11:01 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for es2039.codfw.wmnet
- 10:55 marostegui@deploy1003: marostegui: Continuing with sync
- 10:55 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes on es7" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 10:52 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es7"
- 10:50 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2039 gradually with 4 steps - Ready
- 10:49 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) es2039 gradually with 4 steps - Ready
- 10:49 fceratto@cumin1002: START - Cookbook sre.mysql.pool es2039 gradually with 4 steps - Ready
- 10:41 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 10:40 volans: restarting ircecho on alert1002
- 10:40 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes on es7 (duration: 09m 57s)
- 10:39 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 10:38 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for es2039.codfw.wmnet
- 10:33 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 10:33 marostegui@deploy1003: marostegui: Continuing with sync
- 10:32 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes on es7 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 10:31 elukey@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 10:30 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes on es7
- 10:25 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 10:15 dcausse: T395546: creating empty general indices in eqiad for akwikibooks avkwiki azwikibooks bmwikiquote cswikiversity ladwiki liwiktionary nnwiktionary nowikibooks pnbwiktionary ptwikiquote vowikibooks wikimania2018wiki xhwikibooks zhwikiversity
- 10:13 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 10:13 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 10:13 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 10:13 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 10:13 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 10:13 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 10:13 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 10:12 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 10:12 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 10:10 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 10:10 dcausse: T395546: creating empty content indices in eqiad for afwiktionary biwiktionary bowikibooks collabwiki cywikiquote fawikibooks iewikibooks kywiktionary mnwiktionary sgwiki svwikiquote trwikisource wikimania2007wiki
- 10:10 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2039 - Upgrading es2039.codfw.wmnet
- 10:10 fceratto@cumin1002: START - Cookbook sre.mysql.depool es2039 - Upgrading es2039.codfw.wmnet
- 10:09 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 10:09 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for es2039.codfw.wmnet
- 10:09 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 10:09 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 10:09 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 10:08 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 10:07 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 10:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depool es2039 T395294', diff saved to https://phabricator.wikimedia.org/P76665 and previous config saved to /var/cache/conftool/dbconfig/20250529-100704-fceratto.json
- 10:06 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 10:06 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 10:02 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 09:53 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 09:43 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-druid1003.eqiad.wmnet with OS bullseye
- 09:34 dcausse: deleting 55 red indices in cirrussearch-omega@eqiad
- 09:32 dcausse: deleting 4 red indices in cirrussearch-psi@eqiad
- 09:31 btullis@cumin1002: conftool action : set/pooled=yes; selector: name=cirrussearch1110.eqiad.wmnet,service=elasticsearch-psi-ssl
- 09:31 btullis@cumin1002: conftool action : set/weight=10; selector: name=cirrussearch1110.eqiad.wmnet,service=elasticsearch-psi-ssl
- 09:31 btullis@cumin1002: conftool action : set/pooled=yes; selector: name=cirrussearch1110.eqiad.wmnet,service=elasticsearch-psi-ss
- 09:31 btullis@cumin1002: conftool action : set/weight=10; selector: name=cirrussearch1110.eqiad.wmnet,service=elasticsearch-psi-ss
- 09:01 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage
- 08:57 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-druid1003.eqiad.wmnet with reason: host reimage
- 08:20 dcausse: closing the UTC morning backport window
- 08:18 dcausse@deploy1003: Finished scap sync-world: Backport for Fix "disabled target language" message for Wikifunctions $wgDisabledTargetLanguages (T328838) (duration: 10m 08s)
- 08:11 dcausse@deploy1003: wsung, dcausse: Continuing with sync
- 08:10 dcausse@deploy1003: wsung, dcausse: Backport for Fix "disabled target language" message for Wikifunctions $wgDisabledTargetLanguages (T328838) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:08 dcausse@deploy1003: Started scap sync-world: Backport for Fix "disabled target language" message for Wikifunctions $wgDisabledTargetLanguages (T328838)
- 08:04 dcausse@deploy1003: Finished scap sync-world: Backport for Make Wikifunctions $wgTranslateDisabledTargetLanguages use the translatewiki-model translate target languages (T328838) (duration: 16m 47s)
- 07:57 dcausse@deploy1003: wsung, dcausse: Continuing with sync
- 07:50 dcausse@deploy1003: wsung, dcausse: Backport for Make Wikifunctions $wgTranslateDisabledTargetLanguages use the translatewiki-model translate target languages (T328838) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:47 dcausse@deploy1003: Started scap sync-world: Backport for Make Wikifunctions $wgTranslateDisabledTargetLanguages use the translatewiki-model translate target languages (T328838)
- 07:40 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 07:40 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 07:37 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 07:37 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 07:30 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 07:29 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 07:27 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 07:27 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 07:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cirrussearch[1060-1062].eqiad.wmnet
- 07:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[1060-1062].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
- 06:55 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[1060-1062].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
- 06:52 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 06:16 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts cirrussearch[1060-1062].eqiad.wmnet
- 05:25 marostegui@dns1006: END - running authdns-update
- 05:25 marostegui@dns1006: START - running authdns-update
- 05:25 marostegui: failover m2 master eqiad dbmaint T395241
- 04:35 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "T394511 - oblivian@cumin2002"
- 04:35 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: T394511 - oblivian@cumin2002
- 04:35 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: T394511 - oblivian@cumin2002
- 04:35 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "T394511 - oblivian@cumin2002"
- 01:48 sbassett: Re-deployed security fix for T394396 to 1.45.0-wmf.3
- 01:37 sbassett: Re-deployed security fix for T394396 to 1.45.0-wmf.2
- 01:25 sbassett: restructured core patches in /srv/patches/1.45.0-wmf.2 and /srv/patches/1.45.0-wmf.3 (T395528)
- 01:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2006.codfw.wmnet with OS bullseye
- 00:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 00:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 00:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 00:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 00:15 logmsgbot: dreamyjazz Deployed security patch for T394692
- 00:06 logmsgbot: dreamyjazz Deployed security patch for T394692
2025-05-28
- 23:53 logmsgbot: dreamyjazz Deployed security patch for T394700
- 23:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bookworm
- 23:44 logmsgbot: dreamyjazz Deployed security patch for T394700
- 23:22 logmsgbot: dreamyjazz Deployed security patch for T394693
- 23:10 logmsgbot: dreamyjazz Deployed security patch for T394693
- 23:08 brennen@deploy1003: Finished deploy [phabricator/deployment@99aa712]: test deploy to phab1005 for T377889 (duration: 04m 38s)
- 23:04 brennen@deploy1003: Started deploy [phabricator/deployment@99aa712]: test deploy to phab1005 for T377889
- 23:03 brennen@deploy1003: deploy aborted: test deploy to phab1005 for T377889 (duration: 04m 14s)
- 22:59 brennen@deploy1003: Started deploy [phabricator/deployment@99aa712]: test deploy to phab1005 for T377889
- 21:54 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cirrussearch1053.eqiad.wmnet
- 21:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic1054.eqiad.wmnet
- 21:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: elastic1054.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
- 21:51 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:49 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: elastic1054.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
- 21:43 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cirrussearch1053.eqiad.wmnet
- 21:42 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 21:37 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic1054.eqiad.wmnet
- 20:53 arlolra@deploy1003: Finished scap sync-world: Backport for SecurePoll: Adding files for U4C vote 2025 (T395386), SecurePoll: Adding files for U4C vote 2025 (T395386) (duration: 15m 18s)
- 20:46 arlolra@deploy1003: foks, arlolra: Continuing with sync
- 20:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2003.codfw.wmnet with OS bullseye
- 20:40 arlolra@deploy1003: foks, arlolra: Backport for SecurePoll: Adding files for U4C vote 2025 (T395386), SecurePoll: Adding files for U4C vote 2025 (T395386) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:38 arlolra@deploy1003: Started scap sync-world: Backport for SecurePoll: Adding files for U4C vote 2025 (T395386), SecurePoll: Adding files for U4C vote 2025 (T395386)
- 20:31 arlolra@deploy1003: Finished scap sync-world: Backport for Remove $wgParserEnableLegacyMediaDOM option (T394054) (duration: 10m 28s)
- 20:24 arlolra@deploy1003: arlolra: Continuing with sync
- 20:23 arlolra@deploy1003: arlolra: Backport for Remove $wgParserEnableLegacyMediaDOM option (T394054) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:21 arlolra@deploy1003: Started scap sync-world: Backport for Remove $wgParserEnableLegacyMediaDOM option (T394054)
- 20:19 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
- 20:17 dbrant@deploy1003: Finished scap sync-world: Backport for App Interaction:: Add Tabs (duration: 11m 38s)
- 20:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
- 20:10 dbrant@deploy1003: dbrant, golson-wmf: Continuing with sync
- 20:07 dbrant@deploy1003: dbrant, golson-wmf: Backport for App Interaction:: Add Tabs synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:05 dbrant@deploy1003: Started scap sync-world: Backport for App Interaction:: Add Tabs
- 19:59 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2003.codfw.wmnet with OS bullseye
- 19:42 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on doc1003.eqiad.wmnet with reason: Bookworm Migration
- 18:31 aokoth@dns1004: END - running authdns-update
- 18:30 aokoth@dns1004: START - running authdns-update
- 18:19 dancy@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.3 refs T392173
- 18:07 swfrench@deploy1003: Finished scap sync-world: Scap deployment to put production in a consistent state - T377121 (duration: 07m 48s)
- 18:00 swfrench@deploy1003: Started scap sync-world: Scap deployment to put production in a consistent state - T377121
- 17:52 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bullseye
- 17:48 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-druid1003.eqiad.wmnet with OS bullseye
- 17:37 hmonroy@deploy1003: hmonroy: Continuing with sync
- 17:34 hmonroy@deploy1003: hmonroy: Backport for InitialiseSettings: enable multiblocks on group1 (T377121) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:31 hmonroy@deploy1003: Started scap sync-world: Backport for InitialiseSettings: enable multiblocks on group1 (T377121)
- 17:29 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-druid1003.eqiad.wmnet with OS bullseye
- 17:27 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:24 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 17:20 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on elastic[1054,1067,1103].eqiad.wmnet with reason: downtime until decom
- 16:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76658 and previous config saved to /var/cache/conftool/dbconfig/20250528-165939-root.json
- 16:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76657 and previous config saved to /var/cache/conftool/dbconfig/20250528-164433-root.json
- 16:43 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2003.codfw.wmnet with OS bullseye
- 16:41 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@afad011]: Deploy latest DAGs for main Airflow instance. T385112. (duration: 00m 39s)
- 16:40 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@afad011]: Deploy latest DAGs for main Airflow instance. T385112.
- 16:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76656 and previous config saved to /var/cache/conftool/dbconfig/20250528-162928-root.json
- 16:26 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
- 16:23 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
- 16:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76655 and previous config saved to /var/cache/conftool/dbconfig/20250528-162142-root.json
- 16:19 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7003.magru.wmnet
- 16:19 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir7003.magru.wmnet with OS bookworm
- 16:18 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1119.eqiad.wmnet with reason: Repair data node volume failure
- 16:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance
- 16:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T395241)', diff saved to https://phabricator.wikimedia.org/P76654 and previous config saved to /var/cache/conftool/dbconfig/20250528-161740-fceratto.json
- 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76653 and previous config saved to /var/cache/conftool/dbconfig/20250528-161423-root.json
- 16:08 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2003.codfw.wmnet with OS bullseye
- 16:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76652 and previous config saved to /var/cache/conftool/dbconfig/20250528-160636-root.json
- 16:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P76651 and previous config saved to /var/cache/conftool/dbconfig/20250528-160233-fceratto.json
- 15:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76650 and previous config saved to /var/cache/conftool/dbconfig/20250528-155918-root.json
- 15:55 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76649 and previous config saved to /var/cache/conftool/dbconfig/20250528-155130-root.json
- 15:49 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P76648 and previous config saved to /var/cache/conftool/dbconfig/20250528-154726-fceratto.json
- 15:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76647 and previous config saved to /var/cache/conftool/dbconfig/20250528-154412-root.json
- 15:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76646 and previous config saved to /var/cache/conftool/dbconfig/20250528-153625-root.json
- 15:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T395241)', diff saved to https://phabricator.wikimedia.org/P76645 and previous config saved to /var/cache/conftool/dbconfig/20250528-153220-fceratto.json
- 15:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76644 and previous config saved to /var/cache/conftool/dbconfig/20250528-152907-root.json
- 15:28 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1155.eqiad.wmnet
- 15:27 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 15:26 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7003.magru.wmnet with OS bookworm
- 15:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 15:25 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 15:25 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7003.magru.wmnet on all recursors
- 15:25 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7003.magru.wmnet on all recursors
- 15:25 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 15:25 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 15:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T395241)', diff saved to https://phabricator.wikimedia.org/P76643 and previous config saved to /var/cache/conftool/dbconfig/20250528-152459-fceratto.json
- 15:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance
- 15:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T395241)', diff saved to https://phabricator.wikimedia.org/P76642 and previous config saved to /var/cache/conftool/dbconfig/20250528-152433-fceratto.json
- 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T395241)', diff saved to https://phabricator.wikimedia.org/P76641 and previous config saved to /var/cache/conftool/dbconfig/20250528-152223-fceratto.json
- 15:21 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 15:21 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7003.magru.wmnet
- 15:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76640 and previous config saved to /var/cache/conftool/dbconfig/20250528-152120-root.json
- 15:20 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7003.magru.wmnet
- 15:20 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7003.magru.wmnet on all recursors
- 15:20 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:20 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7003.magru.wmnet on all recursors
- 15:20 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:20 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 15:20 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 15:17 inflatador: bking@mwmaint1002 ban Elastic/CS hosts prior to decom T394350
- 15:17 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 15:17 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7003.magru.wmnet on all recursors
- 15:16 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7003.magru.wmnet on all recursors
- 15:16 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:16 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 15:16 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 15:14 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb1002.eqiad.wmnet with OS bookworm
- 15:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76639 and previous config saved to /var/cache/conftool/dbconfig/20250528-151401-root.json
- 15:13 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 15:13 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7003.magru.wmnet
- 15:10 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1155.eqiad.wmnet
- 15:09 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1148.eqiad.wmnet
- 15:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P76638 and previous config saved to /var/cache/conftool/dbconfig/20250528-150925-fceratto.json
- 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P76637 and previous config saved to /var/cache/conftool/dbconfig/20250528-150716-fceratto.json
- 15:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76636 and previous config saved to /var/cache/conftool/dbconfig/20250528-150614-root.json
- 15:03 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 15:02 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 14:55 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 14:55 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 14:54 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb1002.eqiad.wmnet with reason: host reimage
- 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P76635 and previous config saved to /var/cache/conftool/dbconfig/20250528-145418-fceratto.json
- 14:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P76634 and previous config saved to /var/cache/conftool/dbconfig/20250528-145209-fceratto.json
- 14:52 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1148.eqiad.wmnet
- 14:51 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 14:51 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 14:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76633 and previous config saved to /var/cache/conftool/dbconfig/20250528-145108-root.json
- 14:51 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb1002.eqiad.wmnet with reason: host reimage
- 14:50 moritzm: installing twitter-bootstrap3 security updates
- 14:49 volans: uploaded spicerack_11.0.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
- 14:48 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 14:47 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 14:47 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 14:47 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 14:42 dancy@deploy1003: Installation of scap version "4.171.0" completed for 2 hosts
- 14:40 dancy@deploy1003: Installing scap version "4.171.0" for 2 host(s)
- 14:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T395241)', diff saved to https://phabricator.wikimedia.org/P76632 and previous config saved to /var/cache/conftool/dbconfig/20250528-143910-fceratto.json
- 14:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T395241)', diff saved to https://phabricator.wikimedia.org/P76631 and previous config saved to /var/cache/conftool/dbconfig/20250528-143702-fceratto.json
- 14:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2187 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76630 and previous config saved to /var/cache/conftool/dbconfig/20250528-143603-root.json
- 14:34 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2242.codfw.wmnet onto db2187.codfw.wmnet
- 14:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T395241)', diff saved to https://phabricator.wikimedia.org/P76629 and previous config saved to /var/cache/conftool/dbconfig/20250528-143153-fceratto.json
- 14:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
- 14:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T395241)', diff saved to https://phabricator.wikimedia.org/P76628 and previous config saved to /var/cache/conftool/dbconfig/20250528-143126-fceratto.json
- 14:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T395241)', diff saved to https://phabricator.wikimedia.org/P76627 and previous config saved to /var/cache/conftool/dbconfig/20250528-142926-fceratto.json
- 14:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'Set s8 RW T351820', diff saved to https://phabricator.wikimedia.org/P76626 and previous config saved to /var/cache/conftool/dbconfig/20250528-142745-marostegui.json
- 14:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
- 14:25 marostegui@cumin1002: dbctl commit (dc=all): 'Change x3 masters weights', diff saved to https://phabricator.wikimedia.org/P76625 and previous config saved to /var/cache/conftool/dbconfig/20250528-142503-marostegui.json
- 14:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 14:24 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1148.eqiad.wmnet
- 14:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T395241)', diff saved to https://phabricator.wikimedia.org/P76624 and previous config saved to /var/cache/conftool/dbconfig/20250528-142359-fceratto.json
- 14:23 marostegui@cumin1002: dbctl commit (dc=all): 'Change x3 masters', diff saved to https://phabricator.wikimedia.org/P76623 and previous config saved to /var/cache/conftool/dbconfig/20250528-142349-marostegui.json
- 14:21 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:20 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2003.codfw.wmnet with OS bullseye
- 14:20 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:16 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1148.eqiad.wmnet
- 14:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P76621 and previous config saved to /var/cache/conftool/dbconfig/20250528-141619-fceratto.json
- 14:15 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1155.eqiad.wmnet
- 14:13 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1155.eqiad.wmnet
- 14:07 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover s1 T389373
- 14:07 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s1 T389373
- 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'Set s8 (wikidata) as RO T351820', diff saved to https://phabricator.wikimedia.org/P76616 and previous config saved to /var/cache/conftool/dbconfig/20250528-140441-marostegui.json
- 14:02 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudlb1002.eqiad.wmnet with OS bookworm
- 14:02 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
- 14:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P76615 and previous config saved to /var/cache/conftool/dbconfig/20250528-140154-fceratto.json
- 14:01 marostegui: Set s8 (wikidata) as RO to split x3 from it T351820
- 14:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P76614 and previous config saved to /var/cache/conftool/dbconfig/20250528-140111-fceratto.json
- 13:58 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
- 13:54 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 13:54 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 13:51 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 13:51 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 13:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P76613 and previous config saved to /var/cache/conftool/dbconfig/20250528-134647-fceratto.json
- 13:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T395241)', diff saved to https://phabricator.wikimedia.org/P76612 and previous config saved to /var/cache/conftool/dbconfig/20250528-134604-fceratto.json
- 13:44 marostegui: Move db1211 and db2162 under x3 masters T390530 T351820
- 13:44 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
- 13:43 Lucas_WMDE: UTC afternoon backport+config window done
- 13:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 7 hosts with reason: Maintenance
- 13:42 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2003.codfw.wmnet with OS bullseye
- 13:42 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Restore support for Dark Mode on Wikibase pages (T389330), Enabled ScopedTypeaheadSearch for test.wikidata.org (T394669) (duration: 12m 31s)
- 13:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[2187,2200,2242-2243].codfw.wmnet with reason: Maintenance
- 13:40 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2241.codfw.wmnet with reason: Maintenance
- 13:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1255.eqiad.wmnet with reason: Maintenance
- 13:38 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T395241)', diff saved to https://phabricator.wikimedia.org/P76611 and previous config saved to /var/cache/conftool/dbconfig/20250528-133854-fceratto.json
- 13:38 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 13:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T395241)', diff saved to https://phabricator.wikimedia.org/P76610 and previous config saved to /var/cache/conftool/dbconfig/20250528-133827-fceratto.json
- 13:35 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, arthurtaylor: Continuing with sync
- 13:32 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, arthurtaylor: Backport for Restore support for Dark Mode on Wikibase pages (T389330), Enabled ScopedTypeaheadSearch for test.wikidata.org (T394669) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T395241)', diff saved to https://phabricator.wikimedia.org/P76609 and previous config saved to /var/cache/conftool/dbconfig/20250528-133139-fceratto.json
- 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ncredir node for magru03 - jmm@cumin2002"
- 13:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ncredir node for magru03 - jmm@cumin2002"
- 13:30 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2242 gradually with 4 steps - Pool db2242.codfw.wmnet in after cloning
- 13:30 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Restore support for Dark Mode on Wikibase pages (T389330), Enabled ScopedTypeaheadSearch for test.wikidata.org (T394669)
- 13:27 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:25 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for huwikibooks: add importsources (T395397) (duration: 10m 06s)
- 13:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T395241)', diff saved to https://phabricator.wikimedia.org/P76607 and previous config saved to /var/cache/conftool/dbconfig/20250528-132508-fceratto.json
- 13:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 13:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T395241)', diff saved to https://phabricator.wikimedia.org/P76606 and previous config saved to /var/cache/conftool/dbconfig/20250528-132443-fceratto.json
- 13:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P76605 and previous config saved to /var/cache/conftool/dbconfig/20250528-132320-fceratto.json
- 13:20 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
- 13:19 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
- 13:18 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, anzx: Continuing with sync
- 13:17 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, anzx: Backport for huwikibooks: add importsources (T395397) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:15 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for huwikibooks: add importsources (T395397)
- 13:13 jforrester@deploy1003: Finished scap sync-world: Backport for [BETA CLUSTER] Close en_rtlwiki (duration: 09m 44s)
- 13:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P76603 and previous config saved to /var/cache/conftool/dbconfig/20250528-130935-fceratto.json
- 13:09 marostegui@dns1006: END - running authdns-update
- 13:08 marostegui@dns1006: START - running authdns-update
- 13:08 marostegui@dns1006: END - running authdns-update
- 13:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P76602 and previous config saved to /var/cache/conftool/dbconfig/20250528-130812-fceratto.json
- 13:07 marostegui@dns1006: START - running authdns-update
- 13:07 marostegui: Failover m1 master eqiad dbmaint T395241
- 13:06 jforrester@deploy1003: jforrester: Continuing with sync
- 13:05 jforrester@deploy1003: jforrester: Backport for [BETA CLUSTER] Close en_rtlwiki synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:03 jforrester@deploy1003: Started scap sync-world: Backport for [BETA CLUSTER] Close en_rtlwiki
- 12:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P76600 and previous config saved to /var/cache/conftool/dbconfig/20250528-125427-fceratto.json
- 12:53 marostegui: dbmaint x3 eqiad make it SBR T383795 T390530
- 12:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T395241)', diff saved to https://phabricator.wikimedia.org/P76599 and previous config saved to /var/cache/conftool/dbconfig/20250528-125305-fceratto.json
- 12:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T395241)', diff saved to https://phabricator.wikimedia.org/P76598 and previous config saved to /var/cache/conftool/dbconfig/20250528-124547-fceratto.json
- 12:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 12:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T395241)', diff saved to https://phabricator.wikimedia.org/P76597 and previous config saved to /var/cache/conftool/dbconfig/20250528-124519-fceratto.json
- 12:44 marostegui@cumin1002: START - Cookbook sre.mysql.pool db2242 gradually with 4 steps - Pool db2242.codfw.wmnet in after cloning
- 12:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T395241)', diff saved to https://phabricator.wikimedia.org/P76595 and previous config saved to /var/cache/conftool/dbconfig/20250528-123921-fceratto.json
- 12:38 marostegui: dbmaint x3 codfw make it SBR T383795
- 12:38 marostegui: dbmaint x3 codfw make it SBR T390530
- 12:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb1001.eqiad.wmnet with OS bookworm
- 12:34 elukey@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 12:34 elukey@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T395241)', diff saved to https://phabricator.wikimedia.org/P76594 and previous config saved to /var/cache/conftool/dbconfig/20250528-123255-fceratto.json
- 12:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 12:32 elukey@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T395241)', diff saved to https://phabricator.wikimedia.org/P76593 and previous config saved to /var/cache/conftool/dbconfig/20250528-123230-fceratto.json
- 12:31 btullis@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on an-druid1003.eqiad.wmnet with reason: Cold booting to address disk failure
- 12:31 elukey@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 12:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P76592 and previous config saved to /var/cache/conftool/dbconfig/20250528-123012-fceratto.json
- 12:19 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on backup[2010-2011].codfw.wmnet with reason: Downtime hosts for reboot
- 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P76591 and previous config saved to /var/cache/conftool/dbconfig/20250528-121723-fceratto.json
- 12:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb1001.eqiad.wmnet with reason: host reimage
- 12:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P76590 and previous config saved to /var/cache/conftool/dbconfig/20250528-121505-fceratto.json
- 12:12 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb1001.eqiad.wmnet with reason: host reimage
- 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=93) for new host ncredir7003.magru.wmnet
- 12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 12:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7003.magru.wmnet on all recursors
- 12:05 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir7003.magru.wmnet on all recursors
- 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin2002"
- 12:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin2002"
- 12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P76589 and previous config saved to /var/cache/conftool/dbconfig/20250528-120215-fceratto.json
- 12:01 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir7003.magru.wmnet
- 11:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T395241)', diff saved to https://phabricator.wikimedia.org/P76588 and previous config saved to /var/cache/conftool/dbconfig/20250528-115958-fceratto.json
- 11:57 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on backup[1010-1011].eqiad.wmnet with reason: Downtime hosts for reboot
- 11:52 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1163-1165].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB
- 11:50 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T395241)', diff saved to https://phabricator.wikimedia.org/P76587 and previous config saved to /var/cache/conftool/dbconfig/20250528-115012-fceratto.json
- 11:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 11:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76586 and previous config saved to /var/cache/conftool/dbconfig/20250528-114944-fceratto.json
- 11:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T395241)', diff saved to https://phabricator.wikimedia.org/P76585 and previous config saved to /var/cache/conftool/dbconfig/20250528-114708-fceratto.json
- 11:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141626
- 11:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 141626
- 11:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T395241)', diff saved to https://phabricator.wikimedia.org/P76584 and previous config saved to /var/cache/conftool/dbconfig/20250528-114149-fceratto.json
- 11:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 11:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T395241)', diff saved to https://phabricator.wikimedia.org/P76583 and previous config saved to /var/cache/conftool/dbconfig/20250528-114124-fceratto.json
- 11:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 137409
- 11:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P76582 and previous config saved to /var/cache/conftool/dbconfig/20250528-113437-fceratto.json
- 11:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P76581 and previous config saved to /var/cache/conftool/dbconfig/20250528-112617-fceratto.json
- 11:25 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 137409
- 11:20 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bookworm
- 11:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P76580 and previous config saved to /var/cache/conftool/dbconfig/20250528-111931-fceratto.json
- 11:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P76579 and previous config saved to /var/cache/conftool/dbconfig/20250528-111109-fceratto.json
- 11:07 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76578 and previous config saved to /var/cache/conftool/dbconfig/20250528-110750-root.json
- 11:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76577 and previous config saved to /var/cache/conftool/dbconfig/20250528-110423-fceratto.json
- 10:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T395241)', diff saved to https://phabricator.wikimedia.org/P76576 and previous config saved to /var/cache/conftool/dbconfig/20250528-105602-fceratto.json
- 10:53 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76575 and previous config saved to /var/cache/conftool/dbconfig/20250528-105346-fceratto.json
- 10:53 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 10:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T395241)', diff saved to https://phabricator.wikimedia.org/P76574 and previous config saved to /var/cache/conftool/dbconfig/20250528-105320-fceratto.json
- 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76573 and previous config saved to /var/cache/conftool/dbconfig/20250528-105245-root.json
- 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T395241)', diff saved to https://phabricator.wikimedia.org/P76572 and previous config saved to /var/cache/conftool/dbconfig/20250528-104928-fceratto.json
- 10:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76571 and previous config saved to /var/cache/conftool/dbconfig/20250528-104901-fceratto.json
- 10:40 ayounsi@dns1004: END - running authdns-update
- 10:40 ayounsi@dns1004: START - running authdns-update
- 10:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P76570 and previous config saved to /var/cache/conftool/dbconfig/20250528-103813-fceratto.json
- 10:37 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76569 and previous config saved to /var/cache/conftool/dbconfig/20250528-103738-root.json
- 10:37 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on dbprov[2003-2006].codfw.wmnet with reason: Downtime hosts for reboot
- 10:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P76568 and previous config saved to /var/cache/conftool/dbconfig/20250528-103354-fceratto.json
- 10:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru routed ganeti v6 gw IP - ayounsi@cumin1002"
- 10:33 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru routed ganeti v6 gw IP - ayounsi@cumin1002"
- 10:32 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
- 10:32 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on dbprov[1003-1006].eqiad.wmnet with reason: Downtime hosts for reboot
- 10:29 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 10:28 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
- 10:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P76567 and previous config saved to /var/cache/conftool/dbconfig/20250528-102306-fceratto.json
- 10:22 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76566 and previous config saved to /var/cache/conftool/dbconfig/20250528-102233-root.json
- 10:20 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2187 to dbctl depooled T394884', diff saved to https://phabricator.wikimedia.org/P76565 and previous config saved to /var/cache/conftool/dbconfig/20250528-102015-marostegui.json
- 10:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P76564 and previous config saved to /var/cache/conftool/dbconfig/20250528-101847-fceratto.json
- 10:15 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2242 - Depool db2242.codfw.wmnet to then clone it to db2187.codfw.wmnet - marostegui@cumin1002
- 10:15 marostegui@cumin1002: START - Cookbook sre.mysql.depool db2242 - Depool db2242.codfw.wmnet to then clone it to db2187.codfw.wmnet - marostegui@cumin1002
- 10:15 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2242.codfw.wmnet onto db2187.codfw.wmnet
- 10:13 moritzm: initialise ganeti03 cluster in magru T394263
- 10:09 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2187.codfw.wmnet
- 10:08 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on backup[2002-2003].codfw.wmnet with reason: Downtime hosts for reboot
- 10:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T395241)', diff saved to https://phabricator.wikimedia.org/P76562 and previous config saved to /var/cache/conftool/dbconfig/20250528-100758-fceratto.json
- 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76561 and previous config saved to /var/cache/conftool/dbconfig/20250528-100727-root.json
- 10:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76560 and previous config saved to /var/cache/conftool/dbconfig/20250528-100341-fceratto.json
- 10:03 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2187.codfw.wmnet
- 10:02 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2191.codfw.wmnet onto db2186.codfw.wmnet
- 09:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T395241)', diff saved to https://phabricator.wikimedia.org/P76559 and previous config saved to /var/cache/conftool/dbconfig/20250528-095813-fceratto.json
- 09:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 09:57 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76558 and previous config saved to /var/cache/conftool/dbconfig/20250528-095707-fceratto.json
- 09:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 09:56 moritzm: installing node-serialize-javascript security updates
- 09:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76557 and previous config saved to /var/cache/conftool/dbconfig/20250528-095222-root.json
- 09:51 slyngshede@dns1004: END - running authdns-update
- 09:50 slyngshede@dns1004: START - running authdns-update
- 09:44 ayounsi@dns1004: END - running authdns-update
- 09:43 ayounsi@dns1004: START - running authdns-update
- 09:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru routed ganeti public gw IP - ayounsi@cumin1002"
- 09:41 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru routed ganeti public gw IP - ayounsi@cumin1002"
- 09:40 moritzm: remove failing exim4 auto restart from crm2001 T383715
- 09:39 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76555 and previous config saved to /var/cache/conftool/dbconfig/20250528-093717-root.json
- 09:35 moritzm: instaling prometheus-postfix-exporter updates from Bookworm point release
- 09:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:31 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 09:31 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 09:23 moritzm: installing distro-info-data updates on Bullseye/Bookworm
- 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76553 and previous config saved to /var/cache/conftool/dbconfig/20250528-092212-root.json
- 09:21 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
- 09:17 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
- 09:15 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp1004.wikimedia.org
- 09:14 moritzm: instaling docker.io bookworm updates
- 09:12 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 09:11 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp1004.wikimedia.org
- 09:11 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for es2035.codfw.wmnet
- 09:06 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:06 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new VIP for magru03 - jmm@cumin1003"
- 09:06 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new VIP for magru03 - jmm@cumin1003"
- 09:05 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2035 - Upgrading es2035.codfw.wmnet
- 09:05 slyngshede@dns1004: END - running authdns-update
- 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2035', diff saved to https://phabricator.wikimedia.org/P76551 and previous config saved to /var/cache/conftool/dbconfig/20250528-090528-marostegui.json
- 09:04 slyngshede@dns1004: START - running authdns-update
- 09:04 marostegui@cumin1002: START - Cookbook sre.mysql.depool es2035 - Upgrading es2035.codfw.wmnet
- 09:04 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for es2035.codfw.wmnet
- 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
- 09:01 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 08:56 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2004.wikimedia.org
- 08:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
- 08:53 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idp2004.wikimedia.org
- 08:53 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
- 08:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1038.eqiad.wmnet with reason: Maintenance
- 08:51 slyngshede@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
- 08:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1038.eqiad.wmnet with reason: Maintenance
- 08:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1038.eqiad.wmnet with reason: Maintenance
- 08:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1038 T394469', diff saved to https://phabricator.wikimedia.org/P76550 and previous config saved to /var/cache/conftool/dbconfig/20250528-083338-marostegui.json
- 08:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 8 hosts with reason: Maintenance
- 08:07 slyngshede@dns1004: END - running authdns-update
- 08:06 slyngshede@dns1004: START - running authdns-update
- 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76549 and previous config saved to /var/cache/conftool/dbconfig/20250528-075147-root.json
- 07:47 moritzm: installing intel-microcode security updates on Bullseye
- 07:45 moritzm: installing nodejs security updates
- 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76548 and previous config saved to /var/cache/conftool/dbconfig/20250528-073641-root.json
- 07:24 isaranto@deploy1003: Finished scap sync-world: Backport for Revert^2 "ores-extension: enable ores extention UI in idwiki" (T382171) (duration: 19m 19s)
- 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76547 and previous config saved to /var/cache/conftool/dbconfig/20250528-072135-root.json
- 07:17 isaranto@deploy1003: isaranto: Continuing with sync
- 07:07 isaranto@deploy1003: isaranto: Backport for Revert^2 "ores-extension: enable ores extention UI in idwiki" (T382171) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76546 and previous config saved to /var/cache/conftool/dbconfig/20250528-070629-root.json
- 07:05 isaranto@deploy1003: Started scap sync-world: Backport for Revert^2 "ores-extension: enable ores extention UI in idwiki" (T382171)
- 06:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76545 and previous config saved to /var/cache/conftool/dbconfig/20250528-065124-root.json
- 06:44 kart_: Updated cxserver to 2025-05-28-042852-production (T387229, T395259)
- 06:43 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 06:42 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 06:42 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 06:41 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76544 and previous config saved to /var/cache/conftool/dbconfig/20250528-063618-root.json
- 06:36 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 06:35 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
- 06:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76543 and previous config saved to /var/cache/conftool/dbconfig/20250528-062113-root.json
- 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2186 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76542 and previous config saved to /var/cache/conftool/dbconfig/20250528-060608-root.json
2025-05-27
- 22:27 tgr: UTC late deploys done
- 22:24 tgr@deploy1003: Finished scap sync-world: Backport for slwikibooks: update tagline (T393551), ruwikisource: add Автор (Author) namespace (T395193) (duration: 09m 53s)
- 22:17 tgr@deploy1003: anzx, tgr: Continuing with sync
- 22:16 tgr@deploy1003: anzx, tgr: Backport for slwikibooks: update tagline (T393551), ruwikisource: add Автор (Author) namespace (T395193) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:14 tgr@deploy1003: Started scap sync-world: Backport for slwikibooks: update tagline (T393551), ruwikisource: add Автор (Author) namespace (T395193)
- 22:09 tgr@deploy1003: Finished scap sync-world: Backport for Revert "cowikimedia: Enable Translate&Notifications Exten." (T395382) (duration: 17m 26s)
- 22:08 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch1096.eqiad.wmnet|cirrussearch1097.eqiad.wmnet|cirrussearch1098.eqiad.wmnet|cirrussearch1099.eqiad.wmnet|cirrussearch1100.eqiad.wmnet|cirrussearch1101.eqiad.wmnet|cirrussearch1102.eqiad.wmnet|cirrussearch1107.eqiad.wmnet|cirrussearch1110.eqiad.wmnet|cirrussearch1124.eqiad.wmnet|cirrussearch1125.eqiad.wmnet
- 22:03 denisse: Cleaning up logs older than 70 days in centrallog2002
- 22:02 tgr@deploy1003: zhaofjx, tgr: Continuing with sync
- 21:54 tgr@deploy1003: zhaofjx, tgr: Backport for Revert "cowikimedia: Enable Translate&Notifications Exten." (T395382) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:52 tgr@deploy1003: Started scap sync-world: Backport for Revert "cowikimedia: Enable Translate&Notifications Exten." (T395382)
- 21:48 toyofuku@deploy1003: Finished scap sync-world: Backport for Revert "Deploy summaries pilot" (duration: 13m 16s)
- 21:41 toyofuku@deploy1003: toyofuku: Continuing with sync
- 21:37 toyofuku@deploy1003: toyofuku: Backport for Revert "Deploy summaries pilot" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:35 toyofuku@deploy1003: Started scap sync-world: Backport for Revert "Deploy summaries pilot"
- 21:35 tgr@deploy1003: Unlocked for deployment [MediaWiki]: debugging gerrit 1151280 (duration: 06m 17s)
- 21:28 tgr@deploy1003: Locking from deployment [MediaWiki]: debugging gerrit 1151280
- 21:05 toyofuku@deploy1003: toyofuku, ksarabia, bwang: Backport for Deploy summaries pilot (T393940), Deploy Vector empty search recommendations to wikivoyage and group 1 wikipedias (T393943) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:03 toyofuku@deploy1003: Started scap sync-world: Backport for Deploy summaries pilot (T393940), Deploy Vector empty search recommendations to wikivoyage and group 1 wikipedias (T393943)
- 20:39 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on elastic1054.eqiad.wmnet with reason: downtime until decom
- 20:34 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on elastic1067.eqiad.wmnet with reason: downtime until decom
- 20:28 tgr@deploy1003: Finished scap sync-world: Backport for cowikimedia: Enable Translate&Notifications Exten. (T386776) (duration: 13m 10s)
- 20:21 tgr@deploy1003: zhaofjx, tgr: Continuing with sync
- 20:17 tgr@deploy1003: zhaofjx, tgr: Backport for cowikimedia: Enable Translate&Notifications Exten. (T386776) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:15 tgr@deploy1003: Started scap sync-world: Backport for cowikimedia: Enable Translate&Notifications Exten. (T386776)
- 19:46 tgr@deploy1003: Finished scap sync-world: Backport for Add scrambled: password class (duration: 10m 52s)
- 19:39 tgr@deploy1003: tgr: Continuing with sync
- 19:37 tgr@deploy1003: tgr: Backport for Add scrambled: password class synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:35 tgr@deploy1003: Started scap sync-world: Backport for Add scrambled: password class
- 19:12 inflatador: bking@cumin2002 depool cirrussearch106[0-6] T394350
- 19:07 bking@cumin2002: conftool action : set/pooled=no; selector: name=cirrussearch1060.eqiad.wmnet|cirrussearch1061.eqiad.wmnet|cirrussearch1062.eqiad.wmnet|cirrussearch1063.eqiad.wmnet|cirrussearch1064.eqiad.wmnet|cirrussearch1065.eqiad.wmnet|cirrussearch1066.eqiad.wmnet
- 18:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T395241)', diff saved to https://phabricator.wikimedia.org/P76540 and previous config saved to /var/cache/conftool/dbconfig/20250527-183358-fceratto.json
- 18:21 dancy@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.3 refs T392173
- 18:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P76539 and previous config saved to /var/cache/conftool/dbconfig/20250527-181852-fceratto.json
- 18:11 mutante: zuul1001/zuul2001: sudo apt-get remove --purge docker-compose; sudo apt auto-remove
- 18:08 jgleeson: SmashPig upgraded from f96b898e to 546d17e5
- 18:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P76538 and previous config saved to /var/cache/conftool/dbconfig/20250527-180344-fceratto.json
- 17:56 sukhe: finished running authdns-update for lowering dyna/upload TTL to 180s: T394312
- 17:56 sukhe@dns1004: END - running authdns-update
- 17:55 sukhe@dns1004: START - running authdns-update
- 17:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T395241)', diff saved to https://phabricator.wikimedia.org/P76537 and previous config saved to /var/cache/conftool/dbconfig/20250527-174837-fceratto.json
- 17:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T395241)', diff saved to https://phabricator.wikimedia.org/P76536 and previous config saved to /var/cache/conftool/dbconfig/20250527-174204-fceratto.json
- 17:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 17:41 jasmine@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[1022-1025].eqiad.wmnet
- 17:41 jasmine@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:41 jasmine@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[1022-1025].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1002"
- 17:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T395241)', diff saved to https://phabricator.wikimedia.org/P76535 and previous config saved to /var/cache/conftool/dbconfig/20250527-174137-fceratto.json
- 17:39 jasmine@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[1022-1025].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1002"
- 17:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T395241)', diff saved to https://phabricator.wikimedia.org/P76534 and previous config saved to /var/cache/conftool/dbconfig/20250527-173319-fceratto.json
- 17:28 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul2001.codfw.wmnet with OS bookworm
- 17:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P76533 and previous config saved to /var/cache/conftool/dbconfig/20250527-172629-fceratto.json
- 17:25 jasmine@cumin1002: START - Cookbook sre.dns.netbox
- 17:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P76532 and previous config saved to /var/cache/conftool/dbconfig/20250527-171812-fceratto.json
- 17:11 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: host reimage
- 17:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P76531 and previous config saved to /var/cache/conftool/dbconfig/20250527-171121-fceratto.json
- 17:08 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul2001.codfw.wmnet with reason: host reimage
- 17:03 jasmine@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[1022-1025].eqiad.wmnet
- 17:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P76530 and previous config saved to /var/cache/conftool/dbconfig/20250527-170304-fceratto.json
- 17:01 jynus: restore row per request T395350
- 16:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T395241)', diff saved to https://phabricator.wikimedia.org/P76529 and previous config saved to /var/cache/conftool/dbconfig/20250527-165614-fceratto.json
- 16:51 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host zuul2001.codfw.wmnet with OS bookworm
- 16:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T395241)', diff saved to https://phabricator.wikimedia.org/P76528 and previous config saved to /var/cache/conftool/dbconfig/20250527-164950-fceratto.json
- 16:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 16:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T395241)', diff saved to https://phabricator.wikimedia.org/P76527 and previous config saved to /var/cache/conftool/dbconfig/20250527-164923-fceratto.json
- 16:48 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1022-1025].eqiad.wmnet
- 16:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T395241)', diff saved to https://phabricator.wikimedia.org/P76526 and previous config saved to /var/cache/conftool/dbconfig/20250527-164757-fceratto.json
- 16:43 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1022-1025].eqiad.wmnet
- 16:43 dwisehaupt: stopping process-control and coworker on civi1002 and frdev1002 for updates and reboots.
- 16:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 16:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 16:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T395241)', diff saved to https://phabricator.wikimedia.org/P76525 and previous config saved to /var/cache/conftool/dbconfig/20250527-164136-fceratto.json
- 16:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 16:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T395241)', diff saved to https://phabricator.wikimedia.org/P76524 and previous config saved to /var/cache/conftool/dbconfig/20250527-164110-fceratto.json
- 16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 16:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P76523 and previous config saved to /var/cache/conftool/dbconfig/20250527-163416-fceratto.json
- 16:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P76522 and previous config saved to /var/cache/conftool/dbconfig/20250527-162602-fceratto.json
- 16:21 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1007.eqiad.wmnet
- 16:21 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1007.eqiad.wmnet
- 16:20 dwisehaupt: payments back out of maintenance mode after update/reboot
- 16:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P76521 and previous config saved to /var/cache/conftool/dbconfig/20250527-161909-fceratto.json
- 16:16 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1103.eqiad.wmnet with reason: host reimage
- 16:15 dwisehaupt: payments into maintenance mode for kernel update/reboot of frqueue1003
- 16:14 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2186.codfw.wmnet
- 16:14 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1007.eqiad.wmnet with OS bookworm
- 16:12 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1103.eqiad.wmnet with reason: host reimage
- 16:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P76520 and previous config saved to /var/cache/conftool/dbconfig/20250527-161055-fceratto.json
- 16:10 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on backup2013.codfw.wmnet with reason: Downtime hosts for reboot
- 16:10 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on backup[2001-2003].codfw.wmnet,backup1013.eqiad.wmnet with reason: Downtime hosts for reboot
- 16:05 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2186.codfw.wmnet
- 16:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T395241)', diff saved to https://phabricator.wikimedia.org/P76519 and previous config saved to /var/cache/conftool/dbconfig/20250527-160401-fceratto.json
- 15:59 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2191 gradually with 4 steps - Pool db2191.codfw.wmnet in after cloning
- 15:57 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
- 15:57 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T395241)', diff saved to https://phabricator.wikimedia.org/P76517 and previous config saved to /var/cache/conftool/dbconfig/20250527-155720-fceratto.json
- 15:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 15:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1103
- 15:56 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1103
- 15:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 15:56 swfrench@deploy1003: Finished scap sync-world: Noop deployment to test scap 4.170.0 - T388761 (duration: 04m 03s)
- 15:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T395241)', diff saved to https://phabricator.wikimedia.org/P76516 and previous config saved to /var/cache/conftool/dbconfig/20250527-155546-fceratto.json
- 15:54 klausman@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
- 15:52 swfrench@deploy1003: Started scap sync-world: Noop deployment to test scap 4.170.0 - T388761
- 15:51 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4:00:00 on backup[1001-1003].eqiad.wmnet with reason: Downtime hosts for reboot
- 15:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 15:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T395241)', diff saved to https://phabricator.wikimedia.org/P76515 and previous config saved to /var/cache/conftool/dbconfig/20250527-155125-fceratto.json
- 15:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T395241)', diff saved to https://phabricator.wikimedia.org/P76514 and previous config saved to /var/cache/conftool/dbconfig/20250527-154912-fceratto.json
- 15:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 15:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T395241)', diff saved to https://phabricator.wikimedia.org/P76513 and previous config saved to /var/cache/conftool/dbconfig/20250527-154846-fceratto.json
- 15:47 dancy@deploy1003: Installation of scap version "4.170.0" completed for 2 hosts
- 15:45 dancy@deploy1003: Installing scap version "4.170.0" for 2 host(s)
- 15:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1110.eqiad.wmnet with OS bullseye
- 15:42 jforrester@deploy1003: Finished scap sync-world: Backport for [wikifunctions] Don't grant new generic-enum rights to Functioneers for now (T391913), Wikifunctions: Enable Wikifunction client mode on the first five Wiktionaries (T390552) (duration: 10m 50s)
- 15:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P76511 and previous config saved to /var/cache/conftool/dbconfig/20250527-153618-fceratto.json
- 15:35 jforrester@deploy1003: jforrester: Continuing with sync
- 15:33 jforrester@deploy1003: jforrester: Backport for [wikifunctions] Don't grant new generic-enum rights to Functioneers for now (T391913), Wikifunctions: Enable Wikifunction client mode on the first five Wiktionaries (T390552) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P76510 and previous config saved to /var/cache/conftool/dbconfig/20250527-153339-fceratto.json
- 15:32 klausman@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bookworm
- 15:31 jforrester@deploy1003: Started scap sync-world: Backport for [wikifunctions] Don't grant new generic-enum rights to Functioneers for now (T391913), Wikifunctions: Enable Wikifunction client mode on the first five Wiktionaries (T390552)
- 15:23 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1007.eqiad.wmnet
- 15:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P76508 and previous config saved to /var/cache/conftool/dbconfig/20250527-152110-fceratto.json
- 15:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P76507 and previous config saved to /var/cache/conftool/dbconfig/20250527-151832-fceratto.json
- 15:14 marostegui@cumin1002: START - Cookbook sre.mysql.pool db2191 gradually with 4 steps - Pool db2191.codfw.wmnet in after cloning
- 15:13 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1007.eqiad.wmnet
- 15:12 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1008.eqiad.wmnet
- 15:12 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1008.eqiad.wmnet
- 15:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1110.eqiad.wmnet with reason: host reimage
- 15:09 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1008.eqiad.wmnet with OS bookworm
- 15:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1110.eqiad.wmnet with reason: host reimage
- 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T395241)', diff saved to https://phabricator.wikimedia.org/P76505 and previous config saved to /var/cache/conftool/dbconfig/20250527-150602-fceratto.json
- 15:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T395241)', diff saved to https://phabricator.wikimedia.org/P76504 and previous config saved to /var/cache/conftool/dbconfig/20250527-150325-fceratto.json
- 14:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T395241)', diff saved to https://phabricator.wikimedia.org/P76503 and previous config saved to /var/cache/conftool/dbconfig/20250527-145933-fceratto.json
- 14:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 14:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T395241)', diff saved to https://phabricator.wikimedia.org/P76502 and previous config saved to /var/cache/conftool/dbconfig/20250527-145906-fceratto.json
- 14:58 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1006.eqiad.wmnet
- 14:58 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1006.eqiad.wmnet
- 14:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T395241)', diff saved to https://phabricator.wikimedia.org/P76501 and previous config saved to /var/cache/conftool/dbconfig/20250527-145651-fceratto.json
- 14:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 14:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T395241)', diff saved to https://phabricator.wikimedia.org/P76500 and previous config saved to /var/cache/conftool/dbconfig/20250527-145626-fceratto.json
- 14:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1110
- 14:52 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1110
- 14:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1110.eqiad.wmnet with OS bullseye
- 14:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1110 to cirrussearch1110
- 14:51 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 14:51 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1110
- 14:50 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1110
- 14:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1110 on all recursors
- 14:50 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1110 on all recursors
- 14:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:50 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1110 to cirrussearch1110 - bking@cumin2002"
- 14:50 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 14:50 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1110 to cirrussearch1110 - bking@cumin2002"
- 14:50 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 14:49 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 14:49 lucaswerkmeister-wmde@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 14:47 lucaswerkmeister-wmde@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 14:47 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:46 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1110 to cirrussearch1110
- 14:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P76499 and previous config saved to /var/cache/conftool/dbconfig/20250527-144359-fceratto.json
- 14:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P76498 and previous config saved to /var/cache/conftool/dbconfig/20250527-144119-fceratto.json
- 14:34 marostegui: Deploy schema change on s7 eqiad dbmaint T395333
- 14:32 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
- 14:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P76497 and previous config saved to /var/cache/conftool/dbconfig/20250527-142852-fceratto.json
- 14:27 klausman@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
- 14:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P76496 and previous config saved to /var/cache/conftool/dbconfig/20250527-142612-fceratto.json
- 14:18 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bookworm
- 14:17 mszabo@deploy1003: Finished scap sync-world: Backport for Temp accounts: Allow sysop to grant and revoke IP reveal (T390942) (duration: 12m 00s)
- 14:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T395241)', diff saved to https://phabricator.wikimedia.org/P76495 and previous config saved to /var/cache/conftool/dbconfig/20250527-141346-fceratto.json
- 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T395241)', diff saved to https://phabricator.wikimedia.org/P76494 and previous config saved to /var/cache/conftool/dbconfig/20250527-141104-fceratto.json
- 14:10 mszabo@deploy1003: mszabo, tchanders: Continuing with sync
- 14:07 mszabo@deploy1003: mszabo, tchanders: Backport for Temp accounts: Allow sysop to grant and revoke IP reveal (T390942) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T395241)', diff saved to https://phabricator.wikimedia.org/P76493 and previous config saved to /var/cache/conftool/dbconfig/20250527-140725-fceratto.json
- 14:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 14:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T395241)', diff saved to https://phabricator.wikimedia.org/P76492 and previous config saved to /var/cache/conftool/dbconfig/20250527-140658-fceratto.json
- 14:05 mszabo@deploy1003: Started scap sync-world: Backport for Temp accounts: Allow sysop to grant and revoke IP reveal (T390942)
- 14:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T395241)', diff saved to https://phabricator.wikimedia.org/P76491 and previous config saved to /var/cache/conftool/dbconfig/20250527-140328-fceratto.json
- 14:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 14:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 14:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T395241)', diff saved to https://phabricator.wikimedia.org/P76490 and previous config saved to /var/cache/conftool/dbconfig/20250527-140244-fceratto.json
- 14:01 kharlan@deploy1003: Finished scap sync-world: Backport for Update IPInfo access levels (T375086) (duration: 15m 16s)
- 13:56 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
- 13:55 klausman@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1008.eqiad.wmnet with OS bookworm
- 13:53 kharlan@deploy1003: mszabo, kharlan: Continuing with sync
- 13:53 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
- 13:52 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1008.eqiad.wmnet
- 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P76489 and previous config saved to /var/cache/conftool/dbconfig/20250527-135151-fceratto.json
- 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2186 to dbctl depooled T394884', diff saved to https://phabricator.wikimedia.org/P76488 and previous config saved to /var/cache/conftool/dbconfig/20250527-135141-marostegui.json
- 13:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1006.eqiad.wmnet with OS bookworm
- 13:48 kharlan@deploy1003: mszabo, kharlan: Backport for Update IPInfo access levels (T375086) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P76487 and previous config saved to /var/cache/conftool/dbconfig/20250527-134738-fceratto.json
- 13:47 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1008.eqiad.wmnet
- 13:47 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1009.eqiad.wmnet
- 13:47 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1009.eqiad.wmnet
- 13:46 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2191.codfw.wmnet onto db2186.codfw.wmnet
- 13:45 kharlan@deploy1003: Started scap sync-world: Backport for Update IPInfo access levels (T375086)
- 13:43 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1009.eqiad.wmnet with OS bookworm
- 13:37 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for core-Permissions:Create reviewer role on eswikivoyage, remove patroller and rollbacker (T395293) (duration: 11m 25s)
- 13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P76486 and previous config saved to /var/cache/conftool/dbconfig/20250527-133629-fceratto.json
- 13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bookworm
- 13:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P76485 and previous config saved to /var/cache/conftool/dbconfig/20250527-133231-fceratto.json
- 13:30 lucaswerkmeister-wmde@deploy1003: bunnypranav, lucaswerkmeister-wmde: Continuing with sync
- 13:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
- 13:28 lucaswerkmeister-wmde@deploy1003: bunnypranav, lucaswerkmeister-wmde: Backport for core-Permissions:Create reviewer role on eswikivoyage, remove patroller and rollbacker (T395293) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:26 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for core-Permissions:Create reviewer role on eswikivoyage, remove patroller and rollbacker (T395293)
- 13:25 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1009.eqiad.wmnet with reason: host reimage
- 13:23 Lucas_WMDE: lucaswerkmeister-wmde@deploy1003 ~ $ mwscript-k8s --comment=T395293 --follow -- emptyUserGroup eswikivoyage patroller # removed 3 users in total
- 13:23 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
- 13:23 Lucas_WMDE: lucaswerkmeister-wmde@deploy1003 ~ $ mwscript-k8s --comment=T395293 --follow -- emptyUserGroup eswikivoyage rollbacker # removed 5 users in total
- 13:22 klausman@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1009.eqiad.wmnet with reason: host reimage
- 13:21 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2003-dev.codfw.wmnet with OS bookworm
- 13:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T395241)', diff saved to https://phabricator.wikimedia.org/P76484 and previous config saved to /var/cache/conftool/dbconfig/20250527-132122-fceratto.json
- 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T395241)', diff saved to https://phabricator.wikimedia.org/P76483 and previous config saved to /var/cache/conftool/dbconfig/20250527-131723-fceratto.json
- 13:14 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T395241)', diff saved to https://phabricator.wikimedia.org/P76482 and previous config saved to /var/cache/conftool/dbconfig/20250527-131447-fceratto.json
- 13:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 13:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T395241)', diff saved to https://phabricator.wikimedia.org/P76481 and previous config saved to /var/cache/conftool/dbconfig/20250527-131420-fceratto.json
- 13:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1159 (T395241)', diff saved to https://phabricator.wikimedia.org/P76480 and previous config saved to /var/cache/conftool/dbconfig/20250527-130947-fceratto.json
- 13:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1159.eqiad.wmnet with reason: Maintenance
- 13:05 klausman@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1009.eqiad.wmnet with OS bookworm
- 13:04 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add virtual-magru networks - taavi@cumin1002"
- 13:04 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add virtual-magru networks - taavi@cumin1002"
- 13:03 klausman@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1009.eqiad.wmnet with OS bookworm
- 13:02 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2191.codfw.wmnet
- 13:02 taavi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 13:02 taavi@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove cloudnet2007/8 cloud-private dns records for now - taavi@cumin1002"
- 13:02 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
- 13:02 klausman@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1009.eqiad.wmnet with OS bookworm
- 13:01 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove cloudnet2007/8 cloud-private dns records for now - taavi@cumin1002"
- 13:01 moritzm: installing intel-microcode security updates on Bullseye
- 13:01 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bookworm
- 13:01 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1009.eqiad.wmnet
- 12:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P76479 and previous config saved to /var/cache/conftool/dbconfig/20250527-125913-fceratto.json
- 12:58 taavi@cumin1002: START - Cookbook sre.dns.netbox
- 12:57 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1006.eqiad.wmnet
- 12:57 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2191 - Upgrading db2191.codfw.wmnet
- 12:57 marostegui@cumin1002: START - Cookbook sre.mysql.depool db2191 - Upgrading db2191.codfw.wmnet
- 12:56 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2191.codfw.wmnet
- 12:56 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
- 12:56 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1009.eqiad.wmnet
- 12:49 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2191.codfw.wmnet
- 12:49 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2191.codfw.wmnet
- 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2191 T395241', diff saved to https://phabricator.wikimedia.org/P76476 and previous config saved to /var/cache/conftool/dbconfig/20250527-124929-marostegui.json
- 12:48 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2191.codfw.wmnet
- 12:48 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2191.codfw.wmnet
- 12:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 12:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2191.codfw.wmnet with reason: Maintenance
- 12:47 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1006.eqiad.wmnet
- 12:46 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1005.eqiad.wmnet
- 12:46 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1005.eqiad.wmnet
- 12:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P76475 and previous config saved to /var/cache/conftool/dbconfig/20250527-124406-fceratto.json
- 12:42 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1119.eqiad.wmnet
- 12:37 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bookworm
- 12:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T395241)', diff saved to https://phabricator.wikimedia.org/P76474 and previous config saved to /var/cache/conftool/dbconfig/20250527-122858-fceratto.json
- 12:23 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20115
- 12:22 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 20115
- 12:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T395241)', diff saved to https://phabricator.wikimedia.org/P76473 and previous config saved to /var/cache/conftool/dbconfig/20250527-122226-fceratto.json
- 12:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 12:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 12:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T395241)', diff saved to https://phabricator.wikimedia.org/P76472 and previous config saved to /var/cache/conftool/dbconfig/20250527-122142-fceratto.json
- 12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1005.eqiad.wmnet with OS bookworm
- 12:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P76470 and previous config saved to /var/cache/conftool/dbconfig/20250527-120635-fceratto.json
- 12:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-misc2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 12:00 Emperor: ceph orch apply to bring apus-be2004 into service T391354
- 11:59 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1119.eqiad.wmnet
- 11:59 moritzm: installing nodejs security updates
- 11:52 Emperor: reboot thanos-be100[6-9] before bringing into the rings T391352
- 11:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P76469 and previous config saved to /var/cache/conftool/dbconfig/20250527-115127-fceratto.json
- 11:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T395241)', diff saved to https://phabricator.wikimedia.org/P76468 and previous config saved to /var/cache/conftool/dbconfig/20250527-113619-fceratto.json
- 11:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T395241)', diff saved to https://phabricator.wikimedia.org/P76467 and previous config saved to /var/cache/conftool/dbconfig/20250527-112848-fceratto.json
- 11:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 11:25 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7002.magru.wmnet with OS bookworm
- 11:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
- 11:17 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
- 11:04 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.codfw.wmnet
- 11:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage
- 10:58 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.codfw.wmnet
- 10:56 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage
- 10:42 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bookworm
- 10:41 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 10:40 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 10:40 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 10:40 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 10:35 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1005.eqiad.wmnet
- 10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2008-dev.codfw.wmnet with OS bookworm
- 10:33 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti7002.magru.wmnet with OS bookworm
- 10:30 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1005.eqiad.wmnet
- 10:28 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1010.eqiad.wmnet
- 10:28 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1010.eqiad.wmnet
- 10:27 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1010.eqiad.wmnet with OS bookworm
- 10:25 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 10:25 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 10:19 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 10:18 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 10:18 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 10:18 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 10:15 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 10:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2008-dev.codfw.wmnet with reason: host reimage
- 10:15 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 10:14 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1148.eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 5 - rack F1
- 10:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:13 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1155.eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 5 - rack F1
- 10:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:11 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1010.eqiad.wmnet with reason: host reimage
- 10:08 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2008-dev.codfw.wmnet with reason: host reimage
- 10:08 klausman@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1010.eqiad.wmnet with reason: host reimage
- 09:49 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudnet2008-dev.codfw.wmnet with OS bookworm
- 09:47 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet2007-dev.codfw.wmnet with OS bookworm
- 09:39 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1001.eqiad.wmnet
- 09:36 klausman@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1010.eqiad.wmnet with OS bookworm
- 09:34 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1011.eqiad.wmnet
- 09:34 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1011.eqiad.wmnet
- 09:33 brouberol@cumin2002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1001.eqiad.wmnet
- 09:29 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet2007-dev.codfw.wmnet with reason: host reimage
- 09:27 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1010.eqiad.wmnet
- 09:25 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet2007-dev.codfw.wmnet with reason: host reimage
- 09:22 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1010.eqiad.wmnet
- 09:05 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudnet2007-dev.codfw.wmnet with OS bookworm
- 09:00 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1011.eqiad.wmnet with OS bookworm
- 08:55 moritzm: remove ganeti7002 from the magru02 cluster T394263
- 08:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9002
- 08:46 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 9002
- 08:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34141
- 08:46 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 34141
- 08:45 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 29208
- 08:45 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 29208
- 08:45 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1011.eqiad.wmnet with reason: host reimage
- 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28598
- 08:44 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28598
- 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 20857
- 08:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 20857
- 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16347
- 08:42 klausman@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1011.eqiad.wmnet with reason: host reimage
- 08:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 16347
- 08:40 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59605
- 08:40 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 59605
- 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
- 08:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 40217
- 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
- 08:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 40217
- 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
- 08:34 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 40217
- 08:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40217
- 08:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 40217
- 08:32 volans: deploying debmonitor-client v0.4.1-1 fleet-wide
- 08:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 399728
- 08:31 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 399728
- 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32098
- 08:30 hashar: UTC morning backport window has been completed.
- 08:29 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 32098
- 08:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 209453
- 08:28 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 209453
- 08:28 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 16509
- 08:27 hashar@deploy1003: Finished scap sync-world: Backport for Do not save on Session::renew() when there's nothing to renew (T392251), Don't save after Session::delaySave() when there's no delayed save (T392251) (duration: 11m 29s)
- 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to plain
- 08:26 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to plain
- 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to plain
- 08:20 hashar@deploy1003: hashar, tgr: Continuing with sync
- 08:18 hashar@deploy1003: hashar, tgr: Backport for Do not save on Session::renew() when there's nothing to renew (T392251), Don't save after Session::delaySave() when there's no delayed save (T392251) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to plain
- 08:16 hashar@deploy1003: Started scap sync-world: Backport for Do not save on Session::renew() when there's nothing to renew (T392251), Don't save after Session::delaySave() when there's no delayed save (T392251)
- 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7002.magru.wmnet
- 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet
- 08:13 klausman@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1011.eqiad.wmnet with OS bookworm
- 08:12 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1011.eqiad.wmnet
- 08:07 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1011.eqiad.wmnet
- 08:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 08:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 08:00 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 16509
- 07:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to plain
- 07:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to plain
- 07:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13489
- 07:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 13489
- 07:57 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1273
- 07:56 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 1273
- 07:56 hashar@deploy1003: Finished scap sync-world: Backport for core-Namespaces: Update Malay wiki (mswiki) namespace aliases (T394603), Add user group extendedmover to ukwiki (T395285), InitialiseSettings: wgTemplateDataEnableDiscovery on plwiki and arwiki (T377975) (duration: 16m 31s)
- 07:55 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.codfw.wmnet
- 07:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 151575
- 07:54 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 151575
- 07:51 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4800
- 07:50 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 4800
- 07:49 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.codfw.wmnet
- 07:47 hashar@deploy1003: hashar, bunnypranav, samwilson, ahonc: Continuing with sync
- 07:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45287
- 07:44 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 45287
- 07:44 hashar@deploy1003: hashar, bunnypranav, samwilson, ahonc: Backport for core-Namespaces: Update Malay wiki (mswiki) namespace aliases (T394603), Add user group extendedmover to ukwiki (T395285), InitialiseSettings: wgTemplateDataEnableDiscovery on plwiki and arwiki (T377975) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can
- 07:43 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set cloudcephmon1004 as active after disk replacement - taavi@cumin1002"
- 07:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 209453
- 07:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 209453
- 07:42 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set cloudcephmon1004 as active after disk replacement - taavi@cumin1002"
- 07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 134823
- 07:42 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 134823
- 07:39 hashar@deploy1003: Started scap sync-world: Backport for core-Namespaces: Update Malay wiki (mswiki) namespace aliases (T394603), Add user group extendedmover to ukwiki (T395285), InitialiseSettings: wgTemplateDataEnableDiscovery on plwiki and arwiki (T377975)
- 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 07:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 269180
- 07:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 269180
- 07:35 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 52468
- 07:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52468
- 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to plain
- 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to plain
- 07:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15932
- 07:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 15932
- 07:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10089
- 07:32 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 10089
- 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to plain
- 07:31 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to plain
- 07:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17806
- 07:30 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 17806
- 07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 07:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 07:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 07:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
- 07:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15133
- 07:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 15133
- 07:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
- 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
- 06:38 moritzm: failover Ganeti master in magru/B4 to ganeti7004 T394263
- 06:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
- 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.28 (duration: 01m 50s)
- 03:49 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.3 refs T392173 (duration: 46m 49s)
- 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.3 refs T392173
- 00:20 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 00:20 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 00:20 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 00:20 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 00:20 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 00:20 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
2025-05-26
- 21:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T395241)', diff saved to https://phabricator.wikimedia.org/P76461 and previous config saved to /var/cache/conftool/dbconfig/20250526-215649-fceratto.json
- 21:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P76460 and previous config saved to /var/cache/conftool/dbconfig/20250526-214142-fceratto.json
- 21:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P76459 and previous config saved to /var/cache/conftool/dbconfig/20250526-212634-fceratto.json
- 21:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T395241)', diff saved to https://phabricator.wikimedia.org/P76458 and previous config saved to /var/cache/conftool/dbconfig/20250526-211127-fceratto.json
- 21:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2228 (T395241)', diff saved to https://phabricator.wikimedia.org/P76457 and previous config saved to /var/cache/conftool/dbconfig/20250526-210445-fceratto.json
- 21:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 21:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2228.codfw.wmnet with reason: Maintenance
- 21:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T395241)', diff saved to https://phabricator.wikimedia.org/P76456 and previous config saved to /var/cache/conftool/dbconfig/20250526-210402-fceratto.json
- 20:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P76455 and previous config saved to /var/cache/conftool/dbconfig/20250526-204855-fceratto.json
- 20:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P76454 and previous config saved to /var/cache/conftool/dbconfig/20250526-203348-fceratto.json
- 20:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T395241)', diff saved to https://phabricator.wikimedia.org/P76453 and previous config saved to /var/cache/conftool/dbconfig/20250526-201840-fceratto.json
- 20:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2223 (T395241)', diff saved to https://phabricator.wikimedia.org/P76452 and previous config saved to /var/cache/conftool/dbconfig/20250526-201150-fceratto.json
- 20:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2223.codfw.wmnet with reason: Maintenance
- 20:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T395241)', diff saved to https://phabricator.wikimedia.org/P76451 and previous config saved to /var/cache/conftool/dbconfig/20250526-201123-fceratto.json
- 19:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P76450 and previous config saved to /var/cache/conftool/dbconfig/20250526-195614-fceratto.json
- 19:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P76449 and previous config saved to /var/cache/conftool/dbconfig/20250526-194107-fceratto.json
- 19:33 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doc1004.eqiad.wmnet with reason: Bookworm
- 19:30 denisse: Re-enable sync between grafana hosts - T395098
- 19:29 aokoth@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
- 19:28 aokoth@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
- 19:26 aokoth@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 19:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T395241)', diff saved to https://phabricator.wikimedia.org/P76448 and previous config saved to /var/cache/conftool/dbconfig/20250526-192600-fceratto.json
- 19:24 aokoth@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 19:19 denisse: Upgrading Grafana to v12.0.1 on grafana1002 - T395098
- 19:19 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T395241)', diff saved to https://phabricator.wikimedia.org/P76447 and previous config saved to /var/cache/conftool/dbconfig/20250526-191912-fceratto.json
- 19:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
- 19:17 denisse: Add Grafana v12.0.1 to reprepro for bookworm - T395098
- 19:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
- 19:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T395241)', diff saved to https://phabricator.wikimedia.org/P76446 and previous config saved to /var/cache/conftool/dbconfig/20250526-191341-fceratto.json
- 18:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P76445 and previous config saved to /var/cache/conftool/dbconfig/20250526-185832-fceratto.json
- 18:43 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on doc1003.eqiad.wmnet with reason: Bookworm
- 18:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P76444 and previous config saved to /var/cache/conftool/dbconfig/20250526-184325-fceratto.json
- 18:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T395241)', diff saved to https://phabricator.wikimedia.org/P76443 and previous config saved to /var/cache/conftool/dbconfig/20250526-182817-fceratto.json
- 18:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T395241)', diff saved to https://phabricator.wikimedia.org/P76442 and previous config saved to /var/cache/conftool/dbconfig/20250526-182247-fceratto.json
- 18:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 18:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T395241)', diff saved to https://phabricator.wikimedia.org/P76441 and previous config saved to /var/cache/conftool/dbconfig/20250526-182221-fceratto.json
- 18:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P76440 and previous config saved to /var/cache/conftool/dbconfig/20250526-180714-fceratto.json
- 17:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P76439 and previous config saved to /var/cache/conftool/dbconfig/20250526-175207-fceratto.json
- 17:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T395241)', diff saved to https://phabricator.wikimedia.org/P76438 and previous config saved to /var/cache/conftool/dbconfig/20250526-173700-fceratto.json
- 17:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T395241)', diff saved to https://phabricator.wikimedia.org/P76437 and previous config saved to /var/cache/conftool/dbconfig/20250526-172912-fceratto.json
- 17:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 17:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T395241)', diff saved to https://phabricator.wikimedia.org/P76436 and previous config saved to /var/cache/conftool/dbconfig/20250526-172844-fceratto.json
- 17:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P76435 and previous config saved to /var/cache/conftool/dbconfig/20250526-171338-fceratto.json
- 16:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P76434 and previous config saved to /var/cache/conftool/dbconfig/20250526-165831-fceratto.json
- 16:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T395241)', diff saved to https://phabricator.wikimedia.org/P76433 and previous config saved to /var/cache/conftool/dbconfig/20250526-164324-fceratto.json
- 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T395241)', diff saved to https://phabricator.wikimedia.org/P76432 and previous config saved to /var/cache/conftool/dbconfig/20250526-163530-fceratto.json
- 16:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T395241)', diff saved to https://phabricator.wikimedia.org/P76431 and previous config saved to /var/cache/conftool/dbconfig/20250526-163502-fceratto.json
- 16:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P76430 and previous config saved to /var/cache/conftool/dbconfig/20250526-161955-fceratto.json
- 16:15 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 16:14 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 16:14 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 16:13 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 16:07 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:07 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P76429 and previous config saved to /var/cache/conftool/dbconfig/20250526-160447-fceratto.json
- 15:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T395241)', diff saved to https://phabricator.wikimedia.org/P76428 and previous config saved to /var/cache/conftool/dbconfig/20250526-154939-fceratto.json
- 15:49 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 15:48 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 15:48 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 15:47 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 15:47 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:47 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:46 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 15:46 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 15:46 volans@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cirrussearch2111.codfw.wmnet
- 15:45 volans@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cirrussearch2111.codfw.wmnet
- 15:45 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:44 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T395241)', diff saved to https://phabricator.wikimedia.org/P76427 and previous config saved to /var/cache/conftool/dbconfig/20250526-154304-fceratto.json
- 15:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 15:32 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:31 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 15:30 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:29 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cirrussearch2111.codfw.wmnet
- 15:28 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 15:28 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2111.codfw.wmnet
- 15:15 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:15 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 15:12 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 15:12 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 15:10 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:10 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:10 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
- 15:10 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 15:10 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 15:10 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 15:09 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 15:08 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:08 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:08 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
- 15:08 elukey@deploy1003: helmfile [staging] START helmfile.d/services/kartotherian: sync
- 15:08 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:08 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:04 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
- 15:03 fabfur: temporary depooling cp7001 to restart haproxy (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1150690) (T392219)
- 14:56 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:56 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:48 cgoubert@deploy1003: Finished scap sync-world: mediawiki-cli image update - T395245 (duration: 10m 41s)
- 14:41 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
- 14:41 elukey@deploy1003: helmfile [staging] START helmfile.d/services/kartotherian: sync
- 14:38 cgoubert@deploy1003: Started scap sync-world: mediawiki-cli image update - T395245
- 14:12 isaranto@deploy1003: Finished scap sync-world: Backport for SpecialHomepageLogger: Populate email state even with StartModule disabled (T394017) (duration: 14m 33s)
- 14:02 isaranto@deploy1003: migr, isaranto: Continuing with sync
- 14:01 isaranto@deploy1003: migr, isaranto: Backport for SpecialHomepageLogger: Populate email state even with StartModule disabled (T394017) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:57 isaranto@deploy1003: Started scap sync-world: Backport for SpecialHomepageLogger: Populate email state even with StartModule disabled (T394017)
- 13:50 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
- 13:50 elukey@deploy1003: helmfile [staging] START helmfile.d/services/kartotherian: sync
- 13:47 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1004.eqiad.wmnet
- 13:47 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1004.eqiad.wmnet
- 13:46 elukey@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
- 13:46 elukey@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
- 13:45 elukey@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1004.eqiad.wmnet with OS bookworm
- 13:45 elukey@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 13:42 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lists1004.wikimedia.org with reason: update
- 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
- 13:38 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet
- 13:35 sukhe@dns1004: END - running authdns-update
- 13:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
- 13:34 sukhe@dns1004: START - running authdns-update
- 13:33 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
- 13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
- 13:32 gkyziridis@deploy1003: Sync cancelled.
- 13:32 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1002-dev.eqiad.wmnet
- 13:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
- 13:28 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudbackup1002-dev.eqiad.wmnet
- 13:28 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
- 13:28 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2008-dev.codfw.wmnet
- 13:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3003.esams.wmnet
- 13:24 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
- 13:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3003.esams.wmnet
- 13:22 gkyziridis@deploy1003: isaranto, gkyziridis: Backport for ores-extension: enable ores extention UI in idwiki (T382171) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:21 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudnet2008-dev.codfw.wmnet
- 13:21 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2007-dev.codfw.wmnet
- 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
- 13:14 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudnet2007-dev.codfw.wmnet
- 13:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
- 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
- 13:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
- 13:08 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable ores extention UI in idwiki (T382171)
- 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
- 12:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
- 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow7001.magru.wmnet
- 12:56 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bookworm
- 12:55 kart_: Update Recommendation-API to 2025-05-26-081343-production (T394441, T395026, T306508, T391230)
- 12:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow7001.magru.wmnet
- 12:52 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1004.eqiad.wmnet with OS bookworm
- 12:52 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:52 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bookworm
- 12:49 kartik@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:46 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1004.eqiad.wmnet with OS bookworm
- 12:44 kartik@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:40 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2004-dev.codfw.wmnet
- 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2001.codfw.wmnet
- 12:29 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2004-dev.codfw.wmnet
- 12:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2001.codfw.wmnet
- 12:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2002.codfw.wmnet
- 12:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2002.codfw.wmnet
- 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2003.codfw.wmnet
- 12:12 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cagefive2001.mgmt.codfw.wmnet on all recursors
- 12:12 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache cagefive2001.mgmt.codfw.wmnet on all recursors
- 12:11 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:11 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for cagefive2001 test server - cmooney@cumin1002"
- 12:11 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for cagefive2001 test server - cmooney@cumin1002"
- 12:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2003.codfw.wmnet
- 12:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 12:05 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cagefive2001.codfw.wmnet
- 12:05 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on releases2003.codfw.wmnet with reason: update
- 12:04 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host cagefive2001.codfw.wmnet
- 12:04 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:04 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for cagefive2001 test server - cmooney@cumin1002"
- 12:04 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for cagefive2001 test server - cmooney@cumin1002"
- 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2004.codfw.wmnet
- 12:00 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on aphlict1002.eqiad.wmnet with reason: update
- 11:59 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 11:59 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 11:58 moritzm: installing postgresql-15 security updates
- 11:57 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on aphlict2001.codfw.wmnet with reason: update
- 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2004.codfw.wmnet
- 11:55 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on doc1004.eqiad.wmnet with reason: update
- 11:54 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 11:52 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on doc2003.codfw.wmnet with reason: update
- 11:52 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on etherpad1004.eqiad.wmnet with reason: update
- 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2005.codfw.wmnet
- 11:50 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on etherpad2002.codfw.wmnet with reason: update
- 11:48 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gerrit2003.wikimedia.org with reason: update
- 11:46 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lists2001.wikimedia.org with reason: update
- 11:46 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 11:45 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on people1004.eqiad.wmnet with reason: update
- 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2005.codfw.wmnet
- 11:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2006.codfw.wmnet
- 11:39 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1005.eqiad.wmnet with reason: update
- 11:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2006.codfw.wmnet
- 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 11:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 11:20 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
- 11:20 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet
- 11:16 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudbackup1001-dev.eqiad.wmnet
- 11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 11:15 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 11:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.codfw.wmnet
- 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
- 11:14 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
- 11:09 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.codfw.wmnet
- 11:09 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 11:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
- 10:59 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 10:59 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 10:44 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 10:44 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 10:23 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bookworm
- 10:22 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1004.eqiad.wmnet with OS bookworm
- 10:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bookworm
- 09:52 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1004.eqiad.wmnet
- 09:47 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1004.eqiad.wmnet
- 09:45 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1003.eqiad.wmnet
- 09:45 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1003.eqiad.wmnet
- 09:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1003.eqiad.wmnet with OS bookworm
- 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
- 08:53 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
- 08:48 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1135.eqiad.wmnet with reason: Investigate MegaRAID failure
- 08:44 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 08:43 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 08:41 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 08:41 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
- 08:35 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 08:34 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
- 08:34 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1003.eqiad.wmnet with OS bookworm
- 08:34 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1003.eqiad.wmnet with OS bookworm
- 08:19 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync
- 08:19 elukey@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync
- 08:15 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync
- 08:14 elukey@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync
- 08:12 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: sync
- 08:11 elukey@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync
- 07:32 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1003.eqiad.wmnet with OS bookworm
- 07:28 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1003.eqiad.wmnet
- 07:25 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1177.eqiad.wmnet
- 07:23 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1003.eqiad.wmnet
- 07:23 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1003.eqiad.wmnet
- 07:22 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1003.eqiad.wmnet
- 07:17 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1177.eqiad.wmnet
- 06:58 moritzm: installing Linux 6.1.140 packages
- 06:28 moritzm: installing intel-microcode security updates
- 05:15 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1177.eqiad.wmnet
- 05:13 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1177.eqiad.wmnet
2025-05-25
- 22:39 eileen: config revision changed from 9f17d4fa to 54dbc833
- 22:31 eileen: civicrm upgraded from c8897c92 to 3b59e784
- 22:28 eileen: config revision changed from d2b6530c to 9f17d4fa
- 20:15 eileen: config revision changed from 4144e1b9 to d2b6530c
- 20:10 eileen: config revision changed from 097491f0 to 4144e1b9
- 20:10 eileen: civicrm upgraded from 4dc5f911 to c8897c92
- 13:55 Ammar: Ran fixStuckGlobalRename.php for T395202
2025-05-24
- 00:58 sbassett: Ran scap remove-patch for the first 4 security patches for T392341 (the 5th patch is still pending a merge in gerrit)
2025-05-23
- 23:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-be2004.codfw.wmnet with OS bookworm
- 23:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-be2004.codfw.wmnet with reason: host reimage
- 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-be2004.codfw.wmnet with reason: host reimage
- 22:07 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cirrussearch[2110-2111].codfw.wmnet with reason: firmware update
- 21:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-be2004.codfw.wmnet with OS bookworm
- 21:56 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch2110*,cirrussearch2111* for T394543 - bking@cumin2002
- 21:56 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch2110*,cirrussearch2111* for T394543 - bking@cumin2002
- 21:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1107.eqiad.wmnet with OS bullseye
- 21:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1107.eqiad.wmnet with reason: host reimage
- 20:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1107.eqiad.wmnet with reason: host reimage
- 20:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1102.eqiad.wmnet with OS bullseye
- 20:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1107
- 20:43 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1107
- 20:43 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1107.eqiad.wmnet with OS bullseye
- 20:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1107.eqiad.wmnet on all recursors
- 20:37 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1107.eqiad.wmnet on all recursors
- 20:37 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1107.eqiad.wmnet with OS bullseye
- 20:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1102.eqiad.wmnet with reason: host reimage
- 20:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1107
- 20:17 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1107
- 20:17 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1107.eqiad.wmnet with OS bullseye
- 20:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1107 to cirrussearch1107
- 20:15 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1107
- 20:15 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1107
- 20:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1107 on all recursors
- 20:15 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1107 on all recursors
- 20:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:15 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1107 to cirrussearch1107 - bking@cumin2002"
- 20:14 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1102.eqiad.wmnet with reason: host reimage
- 20:13 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1107 to cirrussearch1107 - bking@cumin2002"
- 20:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1101.eqiad.wmnet with OS bullseye
- 20:09 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:09 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1107 to cirrussearch1107
- 20:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1102
- 20:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1102
- 20:00 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1102.eqiad.wmnet with OS bullseye
- 19:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1102 to cirrussearch1102
- 19:58 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1102
- 19:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1100.eqiad.wmnet with OS bullseye
- 19:52 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1102
- 19:52 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1102 on all recursors
- 19:52 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1102 on all recursors
- 19:52 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:52 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1102 to cirrussearch1102 - bking@cumin2002"
- 19:50 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1102 to cirrussearch1102 - bking@cumin2002"
- 19:47 bking@cumin2002: START - Cookbook sre.dns.netbox
- 19:46 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1102 to cirrussearch1102
- 19:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1101.eqiad.wmnet with reason: host reimage
- 19:41 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1101.eqiad.wmnet with reason: host reimage
- 19:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1100.eqiad.wmnet with reason: host reimage
- 19:23 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1100.eqiad.wmnet with reason: host reimage
- 19:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1101
- 19:22 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1101
- 19:22 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1101.eqiad.wmnet with OS bullseye
- 19:16 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1101 to cirrussearch1101
- 19:15 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1101
- 19:15 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1101
- 19:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1101 on all recursors
- 19:15 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1101 on all recursors
- 19:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:15 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1101 to cirrussearch1101 - bking@cumin2002"
- 19:13 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1101 to cirrussearch1101 - bking@cumin2002"
- 19:09 bking@cumin2002: START - Cookbook sre.dns.netbox
- 19:09 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1101 to cirrussearch1101
- 19:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1100
- 19:08 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1100
- 19:08 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1100.eqiad.wmnet with OS bullseye
- 19:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1100 to cirrussearch1100
- 19:06 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1100
- 19:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1100
- 19:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1100 on all recursors
- 19:06 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1100 on all recursors
- 19:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:06 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1100 to cirrussearch1100 - bking@cumin2002"
- 19:05 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1100 to cirrussearch1100 - bking@cumin2002"
- 18:54 bking@cumin2002: START - Cookbook sre.dns.netbox
- 18:54 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1100 to cirrussearch1100
- 18:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1098.eqiad.wmnet with OS bullseye
- 18:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1099.eqiad.wmnet with OS bullseye
- 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1099.eqiad.wmnet with reason: host reimage
- 17:57 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1099.eqiad.wmnet with reason: host reimage
- 17:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1098.eqiad.wmnet with reason: host reimage
- 17:50 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1098.eqiad.wmnet with reason: host reimage
- 17:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1099
- 17:38 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1099
- 17:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1099.eqiad.wmnet with OS bullseye
- 17:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1099 to cirrussearch1099
- 17:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1099
- 17:37 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1099
- 17:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1099 on all recursors
- 17:37 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1099 on all recursors
- 17:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:36 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1099 to cirrussearch1099 - bking@cumin2002"
- 17:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1099 to cirrussearch1099 - bking@cumin2002"
- 17:33 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:32 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1099 to cirrussearch1099
- 17:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1098
- 17:30 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1098
- 17:30 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1098.eqiad.wmnet with OS bullseye
- 17:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1098 to cirrussearch1098
- 17:29 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1098
- 17:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1098
- 17:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1098 on all recursors
- 17:28 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1098 on all recursors
- 17:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1098 to cirrussearch1098 - bking@cumin2002"
- 17:27 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1098 to cirrussearch1098 - bking@cumin2002"
- 17:24 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:24 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1098 to cirrussearch1098
- 16:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1097.eqiad.wmnet with OS bullseye
- 16:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1096.eqiad.wmnet with OS bullseye
- 15:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1097.eqiad.wmnet with reason: host reimage
- 15:52 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1097.eqiad.wmnet with reason: host reimage
- 15:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1096.eqiad.wmnet with reason: host reimage
- 15:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1097
- 15:38 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1097
- 15:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1097.eqiad.wmnet with OS bullseye
- 15:38 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1096.eqiad.wmnet with reason: host reimage
- 15:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1097 to cirrussearch1097
- 15:36 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1097
- 15:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1097
- 15:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1097 on all recursors
- 15:35 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1097 on all recursors
- 15:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:35 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1097 to cirrussearch1097 - bking@cumin2002"
- 15:34 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1097 to cirrussearch1097 - bking@cumin2002"
- 15:31 bking@cumin2002: START - Cookbook sre.dns.netbox
- 15:31 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1097 to cirrussearch1097
- 15:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1096
- 15:23 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1096
- 15:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1096.eqiad.wmnet with OS bullseye
- 15:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1096 to cirrussearch1096
- 15:22 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1096
- 15:22 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1096
- 15:22 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1096 on all recursors
- 15:22 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1096 on all recursors
- 15:22 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:22 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1096 to cirrussearch1096 - bking@cumin2002"
- 15:21 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1096 to cirrussearch1096 - bking@cumin2002"
- 15:17 bking@cumin2002: START - Cookbook sre.dns.netbox
- 15:17 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1096 to cirrussearch1096
- 15:15 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch.*.eqiad.wmnet
- 14:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 14:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 14:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 14:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 14:26 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 14:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 14:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 14:24 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 14:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 14:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 14:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
- 14:19 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 14:19 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
- 14:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 14:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 14:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 14:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 14:12 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 14:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 14:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 14:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 14:09 bking@cumin2002: conftool action : set/pooled=no; selector: name=elastic1096.eqiad.wmnet|elastic1097.eqiad.wmnet|elastic1098.eqiad.wmnet|elastic1099.eqiad.wmnet|elastic1100.eqiad.wmnet|elastic1101.eqiad.wmnet|elastic1102.eqiad.wmnet|elastic1107.eqiad.wmnet|elastic1110.eqiad.wmnet
- 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
- 14:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
- 14:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
- 14:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
- 13:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 13:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 13:47 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 13:45 tgr: deployed private mitigation for T395073
- 13:07 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
- 13:02 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
- 12:54 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 12:45 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1009.eqiad.wmnet with OS bullseye
- 12:45 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 12:44 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 12:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:37 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:37 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:33 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:25 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1009.eqiad.wmnet with reason: host reimage
- 12:21 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1009.eqiad.wmnet with reason: host reimage
- 11:55 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1008.eqiad.wmnet with OS bullseye
- 11:55 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 11:54 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 11:54 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1009.eqiad.wmnet with OS bullseye
- 11:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
- 11:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1007.eqiad.wmnet with OS bullseye
- 11:51 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 11:49 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 11:47 ladsgroup@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 11:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1008.eqiad.wmnet with reason: host reimage
- 11:23 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1008.eqiad.wmnet with reason: host reimage
- 11:10 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 11:07 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2004.codfw.wmnet
- 11:07 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2004.codfw.wmnet
- 11:06 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2004.codfw.wmnet
- 11:06 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2004.codfw.wmnet
- 11:03 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1007.eqiad.wmnet with reason: host reimage
- 11:03 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2004.codfw.wmnet
- 11:03 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2004.codfw.wmnet
- 11:01 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1008.eqiad.wmnet with OS bullseye
- 11:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:59 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1007.eqiad.wmnet with reason: host reimage
- 10:57 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2004.codfw.wmnet
- 10:56 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2004.codfw.wmnet
- 10:55 jayme@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2004.codfw.wmnet
- 10:55 jayme@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2004.codfw.wmnet
- 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76418 and previous config saved to /var/cache/conftool/dbconfig/20250523-104543-root.json
- 10:45 claime: Manual run of purge-securepollvotedata - T388542
- 10:42 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 10:42 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 10:36 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1007.eqiad.wmnet with OS bullseye
- 10:35 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1006.eqiad.wmnet with OS bullseye
- 10:35 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 10:31 moritzm: importing ferm 2.5.1-4+wmf13u1 T391083
- 10:30 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76417 and previous config saved to /var/cache/conftool/dbconfig/20250523-103038-root.json
- 10:18 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 10:16 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 10:16 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 10:15 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76415 and previous config saved to /var/cache/conftool/dbconfig/20250523-101532-root.json
- 10:15 isaranto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 10:14 isaranto@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 10:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:09 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:08 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
- 10:08 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
- 10:08 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 10:07 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 10:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:02 isaranto@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 10:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:01 isaranto@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76414 and previous config saved to /var/cache/conftool/dbconfig/20250523-100026-root.json
- 09:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:52 isaranto@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
- 09:52 isaranto@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: sync
- 09:52 cgoubert@deploy1003: Finished scap sync-world: 1149613: mediawiki: Add netpol for prometheus HTTP - T388538 (duration: 03m 11s)
- 09:50 cgoubert@deploy1003: Started scap sync-world: 1149613: mediawiki: Add netpol for prometheus HTTP - T388538
- 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76413 and previous config saved to /var/cache/conftool/dbconfig/20250523-094520-root.json
- 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 09:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 09:30 marostegui@cumin1002: dbctl commit (dc=all): 'es2035 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76412 and previous config saved to /var/cache/conftool/dbconfig/20250523-093015-root.json
- 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
- 09:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1109.eqiad.wmnet with OS bullseye
- 09:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2035.codfw.wmnet with reason: Maintenance
- 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
- 09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2035 T394469', diff saved to https://phabricator.wikimedia.org/P76411 and previous config saved to /var/cache/conftool/dbconfig/20250523-091853-marostegui.json
- 09:14 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1006.eqiad.wmnet with reason: host reimage
- 09:10 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1006.eqiad.wmnet with reason: host reimage
- 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 09:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 08:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1109.eqiad.wmnet with reason: host reimage
- 08:53 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1109.eqiad.wmnet with reason: host reimage
- 08:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1108.eqiad.wmnet with OS bullseye
- 08:42 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1006.eqiad.wmnet with OS bullseye
- 08:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 08:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1095.eqiad.wmnet with OS bullseye
- 08:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1109
- 08:33 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1109
- 08:32 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1109.eqiad.wmnet with OS bullseye
- 08:31 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 08:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 08:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 08:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 08:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 08:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 08:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1109 to cirrussearch1109
- 08:24 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
- 08:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1109
- 08:24 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1109
- 08:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1109 on all recursors
- 08:24 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1109 on all recursors
- 08:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1109 to cirrussearch1109 - ryankemper@cumin2002"
- 08:21 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1109 to cirrussearch1109 - ryankemper@cumin2002"
- 08:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 08:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 08:19 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
- 08:19 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1108.eqiad.wmnet with reason: host reimage
- 08:17 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 08:17 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1109 to cirrussearch1109
- 08:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 08:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1095.eqiad.wmnet with reason: host reimage
- 08:14 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1108.eqiad.wmnet with reason: host reimage
- 08:11 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1095.eqiad.wmnet with reason: host reimage
- 08:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1095
- 07:58 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1095
- 07:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1108
- 07:57 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1108
- 07:57 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1095.eqiad.wmnet with OS bullseye
- 07:57 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1108.eqiad.wmnet with OS bullseye
- 07:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1095 to cirrussearch1095
- 07:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1095
- 07:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:45 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1095
- 07:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1095 on all recursors
- 07:45 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1095 on all recursors
- 07:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1095 to cirrussearch1095 - ryankemper@cumin2002"
- 07:45 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1095 to cirrussearch1095 - ryankemper@cumin2002"
- 07:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1108 to cirrussearch1108
- 07:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1108
- 07:42 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 07:41 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1108
- 07:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1108 on all recursors
- 07:41 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1108 on all recursors
- 07:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1108 to cirrussearch1108 - ryankemper@cumin2002"
- 07:41 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1108 to cirrussearch1108 - ryankemper@cumin2002"
- 07:38 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1095 to cirrussearch1095
- 07:37 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 07:36 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1108 to cirrussearch1108
- 07:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:28 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1094.eqiad.wmnet with OS bullseye
- 07:26 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:26 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:26 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:26 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
- 07:25 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1093.eqiad.wmnet with OS bullseye
- 07:22 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:21 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
- 07:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:12 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:11 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 07:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 07:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 07:02 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1094.eqiad.wmnet with reason: host reimage
- 07:01 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 07:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 06:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 06:55 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1094.eqiad.wmnet with reason: host reimage
- 06:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1093.eqiad.wmnet with reason: host reimage
- 06:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 06:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 06:50 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1093.eqiad.wmnet with reason: host reimage
- 06:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 06:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 06:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 06:36 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1094
- 06:36 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1094
- 06:36 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1094.eqiad.wmnet with OS bullseye
- 06:36 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1093
- 06:36 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1093
- 06:35 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1093.eqiad.wmnet with OS bullseye
- 06:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 06:34 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1093 to cirrussearch1093
- 06:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 06:34 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1093
- 06:33 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 06:33 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 06:30 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1093
- 06:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1093 on all recursors
- 06:30 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1093 on all recursors
- 06:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 06:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1093 to cirrussearch1093 - ryankemper@cumin2002"
- 06:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 06:28 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1093 to cirrussearch1093 - ryankemper@cumin2002"
- 06:27 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1094 to cirrussearch1094
- 06:26 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1094
- 06:26 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 06:24 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 06:24 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1094
- 06:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1094 on all recursors
- 06:24 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1094 on all recursors
- 06:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 06:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1094 to cirrussearch1094 - ryankemper@cumin2002"
- 06:24 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1093 to cirrussearch1093
- 06:24 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1094 to cirrussearch1094 - ryankemper@cumin2002"
- 06:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1091.eqiad.wmnet with OS bullseye
- 06:19 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 06:19 marostegui@dns1006: END - running authdns-update
- 06:19 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1094 to cirrussearch1094
- 06:18 marostegui@dns1006: START - running authdns-update
- 06:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 06:17 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1092.eqiad.wmnet with OS bullseye
- 05:59 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1092.eqiad.wmnet with reason: host reimage
- 05:56 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1092.eqiad.wmnet with reason: host reimage
- 05:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1091.eqiad.wmnet with reason: host reimage
- 05:52 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1091.eqiad.wmnet with reason: host reimage
- 05:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1183.eqiad.wmnet
- 05:49 marostegui@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 05:49 marostegui@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1183.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
- 05:49 marostegui@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1183.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
- 05:46 marostegui@cumin1002: START - Cookbook sre.dns.netbox
- 05:39 marostegui@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1183.eqiad.wmnet
- 05:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1091
- 05:39 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1091
- 05:39 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1091.eqiad.wmnet with OS bullseye
- 05:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1092
- 05:39 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1092
- 05:38 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1092.eqiad.wmnet with OS bullseye
- 05:38 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1091.eqiad.wmnet on all recursors
- 05:38 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1091.eqiad.wmnet on all recursors
- 05:38 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1092.eqiad.wmnet on all recursors
- 05:38 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1092.eqiad.wmnet on all recursors
- 05:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1092 to cirrussearch1092
- 05:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1092
- 05:32 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1092
- 05:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1092 on all recursors
- 05:32 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1092 on all recursors
- 05:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 05:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1092 to cirrussearch1092 - ryankemper@cumin2002"
- 05:28 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1091 to cirrussearch1091
- 05:28 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1092 to cirrussearch1092 - ryankemper@cumin2002"
- 05:28 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1091
- 05:25 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 05:25 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1092 to cirrussearch1092
- 05:25 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1091
- 05:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1091 on all recursors
- 05:24 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1091 on all recursors
- 05:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 05:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1091 to cirrussearch1091 - ryankemper@cumin2002"
- 05:23 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1091 to cirrussearch1091 - ryankemper@cumin2002"
- 05:16 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 05:16 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1091 to cirrussearch1091
- 05:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1090.eqiad.wmnet with OS bullseye
- 05:13 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1183 from dbctl T394507', diff saved to https://phabricator.wikimedia.org/P76410 and previous config saved to /var/cache/conftool/dbconfig/20250523-051339-marostegui.json
- 04:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1089.eqiad.wmnet with OS bullseye
- 04:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1090.eqiad.wmnet with reason: host reimage
- 04:35 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1090.eqiad.wmnet with reason: host reimage
- 04:21 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1090
- 04:21 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1090
- 04:21 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1090.eqiad.wmnet with OS bullseye
- 04:21 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1090.eqiad.wmnet on all recursors
- 04:21 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1090.eqiad.wmnet on all recursors
- 04:20 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1089.eqiad.wmnet with reason: host reimage
- 04:16 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1089.eqiad.wmnet with reason: host reimage
- 04:11 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1090 to cirrussearch1090
- 04:10 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1090
- 04:07 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1090
- 04:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1090 on all recursors
- 04:07 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1090 on all recursors
- 04:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1090 to cirrussearch1090 - ryankemper@cumin2002"
- 04:04 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1090 to cirrussearch1090 - ryankemper@cumin2002"
- 03:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1089
- 03:57 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1089
- 03:57 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 03:57 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1089.eqiad.wmnet with OS bullseye
- 03:57 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1090 to cirrussearch1090
2025-05-22
- 23:43 tgr: deployed mitigations for T395073
- 23:41 tgr@deploy1003: Finished scap sync-world: Backport for Enable EmailAuth for users with good ip reputation (duration: 26m 39s)
- 23:34 tgr@deploy1003: kharlan, tgr: Continuing with sync
- 23:16 tgr@deploy1003: kharlan, tgr: Backport for Enable EmailAuth for users with good ip reputation synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:14 tgr@deploy1003: Started scap sync-world: Backport for Enable EmailAuth for users with good ip reputation
- 23:02 tgr@deploy1003: Unlocked for deployment [MediaWiki]: T395073 (duration: 88m 51s)
- 22:30 ladsgroup@dns1004: END - running authdns-update
- 22:29 ladsgroup@dns1004: START - running authdns-update
- 22:27 ladsgroup@dns1004: END - running authdns-update
- 22:26 ladsgroup@dns1004: START - running authdns-update
- 22:23 ladsgroup@dns1004: END - running authdns-update
- 22:22 ladsgroup@dns1004: START - running authdns-update
- 22:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1089 to cirrussearch1089
- 22:19 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1089
- 22:19 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1089
- 22:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1089 on all recursors
- 22:19 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1089 on all recursors
- 22:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1089 to cirrussearch1089 - bking@cumin2002"
- 22:19 ladsgroup@dns1004: END - running authdns-update
- 22:19 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1089 to cirrussearch1089 - bking@cumin2002"
- 22:18 ladsgroup@dns1004: START - running authdns-update
- 22:15 ladsgroup@dns1004: END - running authdns-update
- 22:15 ladsgroup@dns1004: START - running authdns-update
- 22:12 ladsgroup@dns1004: END - running authdns-update
- 22:11 ladsgroup@dns1004: START - running authdns-update
- 22:09 ladsgroup@dns1004: END - running authdns-update
- 22:08 ladsgroup@dns1004: START - running authdns-update
- 22:06 ladsgroup@dns1004: END - running authdns-update
- 22:05 ladsgroup@dns1004: START - running authdns-update
- 21:54 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:53 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1089 to cirrussearch1089
- 21:48 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch.*.eqiad.wmnet
- 21:33 tgr@deploy1003: Locking from deployment [MediaWiki]: T395073
- 21:26 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch.*.eqiad.wmnet
- 21:18 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch1080.eqiad.wmnet|cirrussearch1081.eqiad.wmnet|cirrussearch1082.eqiad.wmnet|cirrussearch1083.eqiad.wmnet|cirrussearch1087.eqiad.wmnet|cirrussearch1088.eqiad.wmnet|cirrussearch1118.eqiad.wmnet|cirrussearch1119.eqiad.wmnet
- 21:14 bking@cumin2002: conftool action : set/pooled=no; selector: name=elastic1089.eqiad.wmnet|elastic1090.eqiad.wmnet|elastic1091.eqiad.wmnet|elastic1092.eqiad.wmnet|elastic1093.eqiad.wmnet|elastic1094.eqiad.wmnet|elastic1095.eqiad.wmnet|elastic1108.eqiad.wmnet|elastic1109.eqiad.wmnet
- 21:13 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: T381919
- 21:10 bking@cumin2002: conftool action : set/pooled=no; selector: name=elastic1055.eqiad.wmnet|elastic1056.eqiad.wmnet|elastic1074.eqiad.wmnet|elastic1075.eqiad.wmnet|elastic1076.eqiad.wmnet|elastic1077.eqiad.wmnet|elastic1078.eqiad.wmnet|elastic1079.eqiad.wmnet|elastic1085.eqiad.wmnet|elastic1086.eqiad.wmnet
- 20:56 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Upgrading to Java 11.0.27 - eevans@cumin1002
- 20:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1060.eqiad.wmnet with OS bullseye
- 20:44 bd808@deploy1003: Finished scap sync-world: Backport for Revert "In ParserAfterTidy use the new ParserOptions::isMessage" (T395034) (duration: 10m 41s)
- 20:37 bd808@deploy1003: bd808, matmarex: Continuing with sync
- 20:35 bd808@deploy1003: bd808, matmarex: Backport for Revert "In ParserAfterTidy use the new ParserOptions::isMessage" (T395034) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:33 bd808@deploy1003: Started scap sync-world: Backport for Revert "In ParserAfterTidy use the new ParserOptions::isMessage" (T395034)
- 20:29 bd808@deploy1003: Finished scap sync-world: Backport for Add AutoModerator to eswiki (T391248), Design Research survey: Undeploy (T394315), arbcom_zhwiki: Change wgWhitelistRead Setting (T394919), arbcom_zhwiki: Enable local upload (T394920) (duration: 10m 47s)
- 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1060.eqiad.wmnet with reason: host reimage
- 20:22 bd808@deploy1003: bd808, zhaofjx, dani, suecarmol: Continuing with sync
- 20:20 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Upgrading to Java 11.0.27 - eevans@cumin1002
- 20:20 bd808@deploy1003: bd808, zhaofjx, dani, suecarmol: Backport for Add AutoModerator to eswiki (T391248), Design Research survey: Undeploy (T394315), arbcom_zhwiki: Change wgWhitelistRead Setting (T394919), arbcom_zhwiki: Enable local upload (T394920) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be
- 20:20 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1060.eqiad.wmnet with reason: host reimage
- 20:18 bd808@deploy1003: Started scap sync-world: Backport for Add AutoModerator to eswiki (T391248), Design Research survey: Undeploy (T394315), arbcom_zhwiki: Change wgWhitelistRead Setting (T394919), arbcom_zhwiki: Enable local upload (T394920)
- 20:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1060
- 20:01 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1060
- 20:01 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1060.eqiad.wmnet with OS bullseye
- 19:23 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: T381919
- 19:23 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 19:23 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 19:21 aokoth@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 19:20 aokoth@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 18:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1061.eqiad.wmnet with OS bullseye
- 18:52 ejegg: payments-wiki upgraded from 7537e0df to 1a4ef678
- 18:46 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1060 to cirrussearch1060
- 18:46 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1060
- 18:44 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1060
- 18:44 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1060 on all recursors
- 18:44 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1060 on all recursors
- 18:44 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:44 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1060 to cirrussearch1060 - bking@cumin2002"
- 18:41 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1060 to cirrussearch1060 - bking@cumin2002"
- 18:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
- 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 18:32 bking@cumin2002: START - Cookbook sre.dns.netbox
- 18:32 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1060 to cirrussearch1060
- 18:30 bvibber@deploy1003: Finished scap sync-world: Backport for Enable Chart extension on phase 3 wikis (T393519) (duration: 09m 34s)
- 18:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 18:23 bvibber@deploy1003: bvibber: Continuing with sync
- 18:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1061.eqiad.wmnet with reason: host reimage
- 18:23 bvibber@deploy1003: bvibber: Backport for Enable Chart extension on phase 3 wikis (T393519) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:21 bvibber@deploy1003: Started scap sync-world: Backport for Enable Chart extension on phase 3 wikis (T393519)
- 18:20 mforns@deploy1003: Finished deploy [analytics/refinery@98f8a96] (thin): Regular analytics weekly train THIN [analytics/refinery@98f8a96a] (duration: 01m 09s)
- 18:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1061.eqiad.wmnet with reason: host reimage
- 18:19 mforns@deploy1003: Started deploy [analytics/refinery@98f8a96] (thin): Regular analytics weekly train THIN [analytics/refinery@98f8a96a]
- 18:18 mforns@deploy1003: Finished deploy [analytics/refinery@98f8a96]: Regular analytics weekly train [analytics/refinery@98f8a96a] (duration: 02m 12s)
- 18:16 mforns@deploy1003: Started deploy [analytics/refinery@98f8a96]: Regular analytics weekly train [analytics/refinery@98f8a96a]
- 18:13 mforns@deploy1003: Finished deploy [analytics/refinery@98f8a96] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@98f8a96a] (duration: 03m 39s)
- 18:11 gmodena@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync
- 18:10 gmodena@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync
- 18:09 gmodena@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: sync
- 18:09 mforns@deploy1003: Started deploy [analytics/refinery@98f8a96] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@98f8a96a]
- 18:08 gmodena@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: sync
- 17:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1061
- 17:59 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1061
- 17:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1061.eqiad.wmnet with OS bullseye
- 17:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1061 to cirrussearch1061
- 17:56 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1061
- 17:53 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1061
- 17:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1061 on all recursors
- 17:53 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1061 on all recursors
- 17:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:53 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1061 to cirrussearch1061 - bking@cumin2002"
- 17:53 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1061 to cirrussearch1061 - bking@cumin2002"
- 17:49 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1061 to cirrussearch1061
- 17:46 ejegg: fundraising civicrm upgraded from 5b155eaa to 4dc5f911
- 17:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 17:40 swfrench@deploy1003: Finished scap sync-world: Deployment clear no-op image diffs (duration: 09m 20s)
- 17:34 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 17:33 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 17:32 swfrench@deploy1003: Started scap sync-world: Deployment clear no-op image diffs
- 17:30 gmodena@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: sync
- 17:27 swfrench@deploy1003: Stopping before sync operations
- 17:26 swfrench@deploy1003: Started scap sync-world: Non-deploy scap run to stop building and publishing PHP 7.4 images - T391057
- 17:25 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 17:24 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 17:24 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 17:24 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 17:24 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 17:24 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 17:23 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:23 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 17:16 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 17:15 swfrench@deploy1003: Stopping before sync operations
- 17:14 swfrench@deploy1003: Started scap sync-world: Non-deploy scap run to pick up mw-script / mw-cron logging changes - T378479
- 17:11 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:08 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:07 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 17:06 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 17:05 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:05 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:03 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 17:02 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 16:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 16:56 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 16:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 16:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 16:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 16:32 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 16:26 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:23 volans@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cirrussearch2110.codfw.wmnet
- 16:23 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2110.codfw.wmnet
- 16:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 16:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 16:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 16:12 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cirrussearch2110.codfw.wmnet
- 16:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1062.eqiad.wmnet with OS bullseye
- 16:08 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2110.codfw.wmnet
- 15:59 volans@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cirrussearch2110.codfw.wmnet
- 15:58 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2110.codfw.wmnet
- 15:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1103
- 15:56 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1103
- 15:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 15:48 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:48 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1062.eqiad.wmnet with reason: host reimage
- 15:38 cgoubert@deploy1003: Finished scap sync-world: 1149346: mediawiki: Add fancycaptcha wordlists to mw-cron - T388531 (duration: 02m 28s)
- 15:36 cgoubert@deploy1003: Started scap sync-world: 1149346: mediawiki: Add fancycaptcha wordlists to mw-cron - T388531
- 15:35 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1062.eqiad.wmnet with reason: host reimage
- 15:35 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 15:33 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:30 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:29 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:28 volans@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cirrussearch2110.codfw.wmnet
- 15:28 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2110.codfw.wmnet
- 15:26 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cirrussearch2110.codfw.wmnet with reason: firmware update cookbook
- 15:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1062
- 15:17 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1062
- 15:17 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1062.eqiad.wmnet with OS bullseye
- 15:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1062 to cirrussearch1062
- 15:16 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1062
- 15:16 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: T383173
- 15:09 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1062
- 15:09 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1062 on all recursors
- 15:09 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1062 on all recursors
- 15:09 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1062 to cirrussearch1062 - bking@cumin2002"
- 15:09 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1062 to cirrussearch1062 - bking@cumin2002"
- 15:08 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:08 moritzm: installing mariadb security updates (as packaged in Debian, not the wmf-mariadb packages)
- 15:08 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:05 bking@cumin2002: START - Cookbook sre.dns.netbox
- 15:05 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1062 to cirrussearch1062
- 15:05 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:04 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:04 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:03 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:03 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:03 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:03 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:03 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:40 sbisson@deploy1003: Finished scap sync-world: Backport for Remove unused wgContentTranslationEnableSectionTranslation (T389970) (duration: 09m 36s)
- 14:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1064.eqiad.wmnet with OS bullseye
- 14:36 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
- 14:35 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 14:35 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 14:35 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 14:34 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 14:34 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 14:34 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 14:33 sbisson@deploy1003: sbisson: Continuing with sync
- 14:33 sbisson@deploy1003: sbisson: Backport for Remove unused wgContentTranslationEnableSectionTranslation (T389970) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:31 sbisson@deploy1003: Started scap sync-world: Backport for Remove unused wgContentTranslationEnableSectionTranslation (T389970)
- 14:30 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
- 14:28 gmodena@deploy1003: Finished scap sync-world: Backport for EventStreamConfig: add staging page_change stream (T394899) (duration: 15m 58s)
- 14:27 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1063.eqiad.wmnet with OS bullseye
- 14:21 gmodena@deploy1003: gmodena: Continuing with sync
- 14:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1103
- 14:14 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1103
- 14:14 gmodena@deploy1003: gmodena: Backport for EventStreamConfig: add staging page_change stream (T394899) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 14:12 gmodena@deploy1003: Started scap sync-world: Backport for EventStreamConfig: add staging page_change stream (T394899)
- 14:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1063
- 14:10 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1063
- 14:10 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1063.eqiad.wmnet with OS bullseye
- 14:09 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
- 14:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1008.eqiad.wmnet with reason: host reimage
- 14:09 klausman@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 14:09 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
- 14:09 sbisson@deploy1003: Finished scap sync-world: Backport for stats(SuggestedEdits): avoid tracking negative tti durations (T394289 T394701) (duration: 11m 04s)
- 14:09 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 14:08 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 14:07 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
- 14:07 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
- 14:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1064.eqiad.wmnet with reason: host reimage
- 14:05 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 14:05 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:05 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 14:04 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1008.eqiad.wmnet with reason: host reimage
- 14:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1064.eqiad.wmnet with reason: host reimage
- 14:02 sbisson@deploy1003: sbisson, migr: Continuing with sync
- 14:00 sbisson@deploy1003: sbisson, migr: Backport for stats(SuggestedEdits): avoid tracking negative tti durations (T394289 T394701) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 13:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1063 to cirrussearch1063
- 13:58 sbisson@deploy1003: Started scap sync-world: Backport for stats(SuggestedEdits): avoid tracking negative tti durations (T394289 T394701)
- 13:57 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1063
- 13:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1009.eqiad.wmnet with reason: host reimage
- 13:55 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
- 13:54 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1063
- 13:54 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1063 on all recursors
- 13:54 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1063 on all recursors
- 13:54 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:54 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1063 to cirrussearch1063 - bking@cumin2002"
- 13:54 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1063 to cirrussearch1063 - bking@cumin2002"
- 13:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1006.eqiad.wmnet with reason: host reimage
- 13:52 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1009.eqiad.wmnet with reason: host reimage
- 13:51 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:50 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1063 to cirrussearch1063
- 13:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1007.eqiad.wmnet with reason: host reimage
- 13:50 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
- 13:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1008.eqiad.wmnet with OS bullseye
- 13:46 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1006.eqiad.wmnet with reason: host reimage
- 13:46 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1007.eqiad.wmnet with reason: host reimage
- 13:46 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1064
- 13:45 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1064
- 13:45 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1064.eqiad.wmnet with OS bullseye
- 13:42 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1011.eqiad.wmnet
- 13:39 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1103.eqiad.wmnet with reason: host reimage
- 13:39 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitarium_restart (exit_code=99)
- 13:39 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
- 13:39 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitarium_restart (exit_code=99)
- 13:38 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
- 13:38 klausman@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:37 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitarium_restart (exit_code=99)
- 13:37 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
- 13:37 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitarium_restart (exit_code=97)
- 13:36 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
- 13:36 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1011.eqiad.wmnet
- 13:36 mszabo@deploy1003: Finished scap sync-world: Backport for ComputedUserImpactLookup: Use logging table for approximate created articles count (T394785) (duration: 10m 32s)
- 13:35 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1103.eqiad.wmnet with reason: host reimage
- 13:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be1008.eqiad.wmnet with OS bullseye
- 13:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1008.eqiad.wmnet with OS bullseye
- 13:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1009.eqiad.wmnet with OS bullseye
- 13:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1010.eqiad.wmnet
- 13:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1007.eqiad.wmnet with OS bullseye
- 13:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1006.eqiad.wmnet with OS bullseye
- 13:28 mszabo@deploy1003: mszabo, kharlan: Continuing with sync
- 13:27 mszabo@deploy1003: mszabo, kharlan: Backport for ComputedUserImpactLookup: Use logging table for approximate created articles count (T394785) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1064 to cirrussearch1064
- 13:26 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1064
- 13:25 mszabo@deploy1003: Started scap sync-world: Backport for ComputedUserImpactLookup: Use logging table for approximate created articles count (T394785)
- 13:25 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1010.eqiad.wmnet
- 13:24 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1064
- 13:24 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1064 on all recursors
- 13:24 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1064 on all recursors
- 13:24 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:24 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1064 to cirrussearch1064 - bking@cumin2002"
- 13:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1064 to cirrussearch1064 - bking@cumin2002"
- 13:20 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:20 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1064 to cirrussearch1064
- 13:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1103
- 13:17 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1103
- 13:17 isaranto@deploy1003: Finished scap sync-world: Backport for ores-extension: enable ores extention for rrla without the UI (T382171) (duration: 12m 17s)
- 13:17 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 13:14 moritzm: installing Java 11 security updates
- 13:11 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:10 isaranto@deploy1003: isaranto: Continuing with sync
- 13:10 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:07 isaranto@deploy1003: isaranto: Backport for ores-extension: enable ores extention for rrla without the UI (T382171) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:05 isaranto@deploy1003: Started scap sync-world: Backport for ores-extension: enable ores extention for rrla without the UI (T382171)
- 13:00 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:56 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
- 12:56 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
- 12:53 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1009.eqiad.wmnet
- 12:53 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:51 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitarium_restart (exit_code=99)
- 12:51 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
- 12:47 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1009.eqiad.wmnet
- 12:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
- 12:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
- 12:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
- 12:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
- 12:42 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:37 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76405 and previous config saved to /var/cache/conftool/dbconfig/20250522-123720-root.json
- 12:34 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: sync
- 12:34 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: sync
- 12:28 hashar: Gerrit is back and was upgraded from 3.10.4 to 3.10.6 | T390666
- 12:25 hashar: Stopping Gerrit for upgrade
- 12:24 hashar@deploy1003: Finished deploy [gerrit/gerrit@facd6ee]: Gerrit to 3.10.6 on gerrit1003 - T390666 (duration: 00m 09s)
- 12:24 hashar@deploy1003: Started deploy [gerrit/gerrit@facd6ee]: Gerrit to 3.10.6 on gerrit1003 - T390666
- 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76404 and previous config saved to /var/cache/conftool/dbconfig/20250522-122215-root.json
- 12:21 hashar@deploy1003: Finished deploy [gerrit/gerrit@facd6ee]: Gerrit to 3.10.6 on gerrit2002 - T390666 (duration: 00m 10s)
- 12:21 hashar@deploy1003: Started deploy [gerrit/gerrit@facd6ee]: Gerrit to 3.10.6 on gerrit2002 - T390666
- 12:14 moritzm: installing nodejs security updates
- 12:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1008.eqiad.wmnet
- 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76400 and previous config saved to /var/cache/conftool/dbconfig/20250522-120709-root.json
- 12:04 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1008.eqiad.wmnet
- 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76398 and previous config saved to /var/cache/conftool/dbconfig/20250522-115203-root.json
- 11:51 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 11:47 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 11:37 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 11:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76394 and previous config saved to /var/cache/conftool/dbconfig/20250522-113657-root.json
- 11:33 ladsgroup@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 11:32 ladsgroup@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 11:29 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 11:26 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1183 T394507', diff saved to https://phabricator.wikimedia.org/P76393 and previous config saved to /var/cache/conftool/dbconfig/20250522-112245-marostegui.json
- 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76392 and previous config saved to /var/cache/conftool/dbconfig/20250522-112152-root.json
- 11:19 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1007.eqiad.wmnet
- 11:16 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 11:16 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 11:15 volans: uploaded debmonitor-client_0.4.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia,trixie-wikimedia
- 11:15 marostegui: Migrate es1036 es6 eqiad dbmaint to MariaDB 10.11 T394469
- 11:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1036.eqiad.wmnet with reason: Maintenance
- 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1036 T394469', diff saved to https://phabricator.wikimedia.org/P76391 and previous config saved to /var/cache/conftool/dbconfig/20250522-111422-marostegui.json
- 11:11 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1007.eqiad.wmnet
- 11:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to 17.10
- 11:06 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 11:05 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 11:01 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1006.eqiad.wmnet
- 10:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1006.eqiad.wmnet
- 10:51 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1005.eqiad.wmnet
- 10:44 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1005.eqiad.wmnet
- 10:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet
- 10:31 moritzm: installing imagemagick security updates
- 10:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1004.eqiad.wmnet
- 10:14 btullis@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:dse-k8s-worker
- 10:03 moritzm: installing spamassassin bugfix updates from Bookworm point release
- 09:58 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:58 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:54 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 09:53 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 09:52 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 09:50 marostegui: dbmaint codfw eqiad Pool pc8 new section T394260
- 09:50 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Add pc8 T394260', diff saved to https://phabricator.wikimedia.org/P76390 and previous config saved to /var/cache/conftool/dbconfig/20250522-095017-marostegui.json
- 09:32 btullis@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
- 09:29 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet
- 09:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1002.eqiad.wmnet
- 09:21 btullis@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:dse-k8s-worker
- 09:00 btullis@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
- 08:47 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1002.eqiad.wmnet
- 08:47 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1002.eqiad.wmnet
- 08:38 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1002.eqiad.wmnet with OS bookworm
- 08:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
- 08:19 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.2 refs T392172
- 08:17 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
- 08:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve1002
- 08:00 elukey@cumin1002: START - Cookbook sre.hosts.move-vlan for host ml-serve1002
- 08:00 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1002.eqiad.wmnet with OS bookworm
- 07:49 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve1002.eqiad.wmnet
- 07:43 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1002.eqiad.wmnet
- 07:35 elukey: cleanup - `elukey@config-master2001:/var/run/confd-template$ sudo rm _srv_config-master_pybal_codfw_wdqs-internal.err _srv_config-master_pybal_eqiad_wdqs-internal.err`
- 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 07:32 stran@deploy1003: Finished scap sync-world: Backport for Temp accounts: Set group requirements for IP reveal group (T393615) (duration: 10m 01s)
- 07:31 elukey: cleanup (wdqs-internal lvs teardown) - `elukey@config-master1001:/var/run/confd-template$ sudo rm _srv_config-master_pybal_codfw_wdqs-internal.err _srv_config-master_pybal_eqiad_wdqs-internal.err`
- 07:25 stran@deploy1003: tchanders, stran: Continuing with sync
- 07:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 07:25 stran@deploy1003: tchanders, stran: Backport for Temp accounts: Set group requirements for IP reveal group (T393615) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76387 and previous config saved to /var/cache/conftool/dbconfig/20250522-072313-root.json
- 07:22 stran@deploy1003: Started scap sync-world: Backport for Temp accounts: Set group requirements for IP reveal group (T393615)
- 07:21 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to 17.10
- 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76386 and previous config saved to /var/cache/conftool/dbconfig/20250522-071805-root.json
- 07:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 07:15 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.10
- 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76385 and previous config saved to /var/cache/conftool/dbconfig/20250522-070807-root.json
- 07:07 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.10
- 07:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76384 and previous config saved to /var/cache/conftool/dbconfig/20250522-070259-root.json
- 06:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76383 and previous config saved to /var/cache/conftool/dbconfig/20250522-065302-root.json
- 06:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76382 and previous config saved to /var/cache/conftool/dbconfig/20250522-064754-root.json
- 06:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 06:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76381 and previous config saved to /var/cache/conftool/dbconfig/20250522-063756-root.json
- 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76380 and previous config saved to /var/cache/conftool/dbconfig/20250522-063248-root.json
- 06:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 06:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 06:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76379 and previous config saved to /var/cache/conftool/dbconfig/20250522-062251-root.json
- 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76378 and previous config saved to /var/cache/conftool/dbconfig/20250522-061742-root.json
- 06:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1188 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76377 and previous config saved to /var/cache/conftool/dbconfig/20250522-060745-root.json
- 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1188', diff saved to https://phabricator.wikimedia.org/P76376 and previous config saved to /var/cache/conftool/dbconfig/20250522-060556-marostegui.json
- 06:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76375 and previous config saved to /var/cache/conftool/dbconfig/20250522-060236-root.json
- 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76374 and previous config saved to /var/cache/conftool/dbconfig/20250522-054730-root.json
- 05:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P76373 and previous config saved to /var/cache/conftool/dbconfig/20250522-053938-marostegui.json
- 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'Add pc1018 and pc2018 to dbctl depooled T394260', diff saved to https://phabricator.wikimedia.org/P76372 and previous config saved to /var/cache/conftool/dbconfig/20250522-052649-marostegui.json
- 04:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2018.codfw.wmnet with reason: Maintenance
- 04:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1018.eqiad.wmnet with reason: Maintenance
2025-05-21
- 23:23 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 23:00 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
- 22:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bookworm
- 22:54 sbassett@deploy1003: Finished scap sync-world: Backport for Revert "OATHAuth: Mark checkuser and suppress as requiring 2FA" (duration: 10m 05s)
- 22:52 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
- 22:47 sbassett@deploy1003: sbassett: Continuing with sync
- 22:46 sbassett@deploy1003: sbassett: Backport for Revert "OATHAuth: Mark checkuser and suppress as requiring 2FA" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:44 sbassett@deploy1003: Started scap sync-world: Backport for Revert "OATHAuth: Mark checkuser and suppress as requiring 2FA"
- 22:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2004.codfw.wmnet with OS bookworm
- 22:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Upgrading to Java 11.0.27 - eevans@cumin1002
- 22:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1103
- 22:02 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1103
- 22:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1007.eqiad.wmnet with OS bullseye
- 22:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 22:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
- 21:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1006.eqiad.wmnet with OS bullseye
- 21:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
- 21:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be1009.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1009.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:36 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bookworm
- 21:36 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
- 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 21:29 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:29 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1008,1009 - jclark@cumin1002"
- 21:29 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1008,1009 - jclark@cumin1002"
- 21:25 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 21:20 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 21:19 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 21:19 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 21:19 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 21:18 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 21:18 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 21:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:13 dmartin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 21:12 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 21:12 dmartin@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 21:11 dmartin@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 21:10 dmartin@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 21:09 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 21:08 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1065.eqiad.wmnet with OS bullseye
- 21:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 21:01 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@2bce0c7]: Deploy Airflow artifact for T392494 and T394310. (duration: 00m 55s)
- 21:00 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@2bce0c7]: Deploy Airflow artifact for T392494 and T394310.
- 20:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 20:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2003.codfw.wmnet with OS bookworm
- 20:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:55 kemayo@deploy1003: Finished scap sync-world: Backport for Extend the mobile insert menu config so that tools can be specified (T388604), Extend the mobile insert menu config so that tools can be specified (T388604), VisualEditor: Deploy '+' mobile menu (and new tools) to Phase 1 wikis (T388604) (duration: 10m 56s)
- 20:48 kemayo@deploy1003: jforrester, kemayo: Continuing with sync
- 20:46 kemayo@deploy1003: jforrester, kemayo: Backport for Extend the mobile insert menu config so that tools can be specified (T388604), Extend the mobile insert menu config so that tools can be specified (T388604), VisualEditor: Deploy '+' mobile menu (and new tools) to Phase 1 wikis (T388604) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug
- 20:44 kemayo@deploy1003: Started scap sync-world: Backport for Extend the mobile insert menu config so that tools can be specified (T388604), Extend the mobile insert menu config so that tools can be specified (T388604), VisualEditor: Deploy '+' mobile menu (and new tools) to Phase 1 wikis (T388604)
- 20:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1065.eqiad.wmnet with reason: host reimage
- 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2004.codfw.wmnet with OS bookworm
- 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:38 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1065.eqiad.wmnet with reason: host reimage
- 20:31 dmartin@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 20:30 dmartin@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 20:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1065
- 20:20 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1065
- 20:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1065.eqiad.wmnet with OS bullseye
- 20:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1065 to cirrussearch1065
- 20:19 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1065
- 20:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:18 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:18 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1065
- 20:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1065 on all recursors
- 20:18 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1065 on all recursors
- 20:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:18 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1065 to cirrussearch1065 - bking@cumin2002"
- 20:18 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1065 to cirrussearch1065 - bking@cumin2002"
- 20:15 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1006,1007 - jclark@cumin1002"
- 20:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1006,1007 - jclark@cumin1002"
- 20:14 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1066.eqiad.wmnet with OS bullseye
- 20:09 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1065 to cirrussearch1065
- 20:08 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 20:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
- 19:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2004.codfw.wmnet with reason: host reimage
- 19:58 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch1080.eqiad.wmnet|name=cirrussearch1081.eqiad.wmnet|name=cirrussearch1082.eqiad.wmnet|name=cirrussearch1083.eqiad.wmnet|name=cirrussearch1087.eqiad.wmnet|name=cirrussearch1088.eqiad.wmnet|name=cirrussearch1118.eqiad.wmnet|name=cirrussearch1119.eqiad.wmnet
- 19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
- 19:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1103
- 19:52 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1103
- 19:52 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2003.codfw.wmnet with reason: host reimage
- 19:50 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Upgrading to Java 11.0.27 - eevans@cumin1002
- 19:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1066.eqiad.wmnet with reason: host reimage
- 19:43 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 19:41 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1066.eqiad.wmnet with reason: host reimage
- 19:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2004.codfw.wmnet with OS bookworm
- 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1066
- 19:24 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1066
- 19:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1066.eqiad.wmnet with OS bullseye
- 19:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
- 19:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1066 to cirrussearch1066
- 19:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:19 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1066
- 19:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:18 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1066
- 19:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1066 on all recursors
- 19:18 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1066 on all recursors
- 19:18 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:18 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1066 to cirrussearch1066 - bking@cumin2002"
- 19:15 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1066 to cirrussearch1066 - bking@cumin2002"
- 19:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:10 bking@cumin2002: START - Cookbook sre.dns.netbox
- 19:10 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1066 to cirrussearch1066
- 18:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1103
- 18:45 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1103
- 18:45 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1103.eqiad.wmnet with OS bullseye
- 18:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1103 to cirrussearch1103
- 18:42 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1103
- 18:39 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1103
- 18:39 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1103 on all recursors
- 18:39 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1103 on all recursors
- 18:39 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:39 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1103 to cirrussearch1103 - bking@cumin2002"
- 18:38 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1103 to cirrussearch1103 - bking@cumin2002"
- 18:35 bking@cumin2002: START - Cookbook sre.dns.netbox
- 18:35 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1103 to cirrussearch1103
- 18:23 bking@cumin2002: conftool action : set/pooled=no:weight=10; selector: name=elastic1060.eqiad.wmnet|name=elastic1061.eqiad.wmnet|name=elastic1062.eqiad.wmnet|name=elastic1063.eqiad.wmnet|name=elastic1064.eqiad.wmnet|name=elastic1065.eqiad.wmnet|name=elastic1066.eqiad.wmnet|name=elastic1067.eqiad.wmnet|name=elastic1103.eqiad.wmnet
- 18:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bookworm
- 18:21 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Support creating logs in emptyUserGroup.php (T394914) (duration: 13m 18s)
- 18:14 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
- 18:10 dreamyjazz@deploy1003: dreamyjazz: Backport for Support creating logs in emptyUserGroup.php (T394914) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:08 dreamyjazz@deploy1003: Started scap sync-world: Backport for Support creating logs in emptyUserGroup.php (T394914)
- 17:33 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:33 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:46 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitarium-restart (exit_code=99)
- 16:46 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium-restart
- 16:46 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitarium-restart (exit_code=99)
- 16:46 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium-restart
- 16:45 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitarium-restart (exit_code=97)
- 16:45 fceratto@cumin1002: START - Cookbook sre.mysql.sanitarium-restart
- 16:40 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Upgrading to Java 11.0.27 - eevans@cumin1002
- 16:13 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:12 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
- 16:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:56 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
- 15:56 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
- 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:25 jynus: forgetting 4 old instances @ orchestrator-web T384274
- 15:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2018.codfw.wmnet with OS bookworm
- 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2018.codfw.wmnet with reason: host reimage
- 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2018.codfw.wmnet with reason: host reimage
- 14:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2018.codfw.wmnet with OS bookworm
- 14:43 moritzm: installing postgresql-15 security updates
- 14:42 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and not P{dns7001*} and A:dnsbox
- 14:40 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: Restart x3
- 14:38 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:32 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
- 14:30 sbassett: Deployed updated security fix for T392341 (04) to 1.45-wmf.2
- 14:27 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 14:25 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Upgrading to Java 11.0.27 - eevans@cumin1002
- 14:25 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
- 14:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
- 14:20 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
- 14:12 reedy@deploy1003: Finished scap sync-world: Backport for Revert^2 "extension-list: Add ConfirmEdit/hCaptcha/extension.json" (T382148 T394814) (duration: 45m 53s)
- 14:11 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
- 14:09 vgutierrez: enabling edge uniques in one server per DC and cluster (cp[1100-1101],cp[2027-2028],cp3074,cp[5017,5025],cp[6001,6009],cp[7001,7009])- T391411
- 14:08 sukhe: running agent on A:wikidough
- 14:08 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1018.eqiad.wmnet with reason: host reimage
- 14:04 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1018.eqiad.wmnet with reason: host reimage
- 14:01 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
- 13:58 reedy@deploy1003: reedy: Continuing with sync
- 13:57 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=ml-serve20.*.codfw.wmnet
- 13:57 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=ml-serve10.*.eqiad.wmnet
- 13:56 reedy@deploy1003: reedy: Backport for Revert^2 "extension-list: Add ConfirmEdit/hCaptcha/extension.json" (T382148 T394814) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:54 elukey@puppetserver1001: conftool action : set/weight=1; selector: name=ml-serve1001.eqiad.wmnet
- 13:50 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: name=ml-serve1001.eqiad.wmnet,dc=eqiad,cluster=maps,service=inference
- 13:49 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 13:49 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 13:48 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve1001.eqiad.wmnet
- 13:48 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve1001.eqiad.wmnet
- 13:48 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox and not P{dns7001*} and A:dnsbox
- 13:47 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
- 13:43 sukhe: updating dns-root-data on A:wikidough
- 13:42 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
- 13:42 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
- 13:41 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
- 13:41 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply
- 13:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1001.eqiad.wmnet with OS bookworm
- 13:41 sukhe: updating dns-root-data on A:dnsbox
- 13:40 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
- 13:39 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
- 13:37 akosiaris: deploy eventgate-main to pickup the CPU change as well as the change in envoy histogram buckets
- 13:37 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1001.eqiad.wmnet
- 13:36 vgutierrez: enabling edge uniques on cp4045 - T391411
- 13:26 reedy@deploy1003: Started scap sync-world: Backport for Revert^2 "extension-list: Add ConfirmEdit/hCaptcha/extension.json" (T382148 T394814)
- 13:25 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1001.eqiad.wmnet
- 13:24 reedy@deploy1003: Finished scap sync-world: Backport for Stop setting $wgCaptchaClass in extension.json files (T394814), Stop setting $wgCaptchaClass in extension.json files (T394814), Add mediawiki.ForeignApi.core as a dependency (T387720) (duration: 10m 52s)
- 13:24 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
- 13:21 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
- 13:17 reedy@deploy1003: reedy, stran: Continuing with sync
- 13:17 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "import new switches from netbox to hiera now they are status active - cmooney@cumin1003 - T394021"
- 13:16 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "import new switches from netbox to hiera now they are status active - cmooney@cumin1003 - T394021"
- 13:16 reedy@deploy1003: reedy, stran: Backport for Stop setting $wgCaptchaClass in extension.json files (T394814), Stop setting $wgCaptchaClass in extension.json files (T394814), Add mediawiki.ForeignApi.core as a dependency (T387720) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:13 reedy@deploy1003: Started scap sync-world: Backport for Stop setting $wgCaptchaClass in extension.json files (T394814), Stop setting $wgCaptchaClass in extension.json files (T394814), Add mediawiki.ForeignApi.core as a dependency (T387720)
- 13:04 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bookworm
- 12:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host mc-misc2002.codfw.wmnet
- 12:49 topranks: test new core_out bgp policy on asw1-bw27-esams (T394530)
- 12:48 pmiazga: Ran fixStuckGlobalRename.php for T394905
- 12:34 brouberol@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 12:30 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76367 and previous config saved to /var/cache/conftool/dbconfig/20250521-115950-root.json
- 11:56 cmooney@dns2005: END - running authdns-update
- 11:55 cmooney@dns2005: START - running authdns-update
- 11:44 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76366 and previous config saved to /var/cache/conftool/dbconfig/20250521-114444-root.json
- 11:37 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:37 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: push IPv6 address changes for codfw expansion link networks - cmooney@cumin1002"
- 11:37 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: push IPv6 address changes for codfw expansion link networks - cmooney@cumin1002"
- 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
- 11:33 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
- 11:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 11:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 11:29 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76365 and previous config saved to /var/cache/conftool/dbconfig/20250521-112939-root.json
- 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76364 and previous config saved to /var/cache/conftool/dbconfig/20250521-111433-root.json
- 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 11:01 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
- 11:01 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
- 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76363 and previous config saved to /var/cache/conftool/dbconfig/20250521-105928-root.json
- 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
- 10:56 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
- 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
- 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 10:51 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76362 and previous config saved to /var/cache/conftool/dbconfig/20250521-104422-root.json
- 10:43 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1001.eqiad.wmnet with OS bookworm
- 10:42 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 10:41 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
- 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
- 10:39 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 10:38 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
- 10:37 moritzm: installing expat security updates
- 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76360 and previous config saved to /var/cache/conftool/dbconfig/20250521-102917-root.json
- 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 10:26 vgutierrez: enabling edge uniques on cp3066 - T391411
- 10:24 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bookworm
- 10:23 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1001.eqiad.wmnet with OS bookworm
- 10:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'es2036 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76358 and previous config saved to /var/cache/conftool/dbconfig/20250521-101412-root.json
- 10:07 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: sync
- 10:07 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: sync
- 10:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2036.codfw.wmnet with reason: Maintenance
- 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2036', diff saved to https://phabricator.wikimedia.org/P76357 and previous config saved to /var/cache/conftool/dbconfig/20250521-100055-marostegui.json
- 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 09:58 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:57 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 09:34 Emperor: radosgw-admin bucket rm --bucket=gitlab-artifacts --bypass-gc --purge-objects T378922
- 09:32 Emperor: radosgw-admin bucket rm --bucket=gitlab-packages --bypass-gc --purge-objects T378922
- 09:31 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 09:31 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 09:31 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 09:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 09:26 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bookworm
- 09:25 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1001.eqiad.wmnet with OS bookworm
- 09:22 XioNoX: cr2-eqdfw> request vmhost reboot - T364092
- 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76356 and previous config saved to /var/cache/conftool/dbconfig/20250521-091914-root.json
- 09:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 09:13 XioNoX: cr2-eqdfw - shutdown transit/ix BGP sessions - T364092
- 09:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve1001
- 09:13 elukey@cumin1002: START - Cookbook sre.hosts.move-vlan for host ml-serve1001
- 09:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bookworm
- 09:12 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thanos-fe[1001-1003].eqiad.wmnet
- 09:12 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:12 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-fe[1001-1003].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1002"
- 09:12 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1001.eqiad.wmnet with OS bookworm
- 09:11 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: router upgrade
- 09:10 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 09:09 brouberol@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 09:08 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-fe[1001-1003].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1002"
- 09:08 ayounsi@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr2-eqdfw,cr2-eqdfw IPv6,cr2-eqdfw.mgmt with reason: router upgrade
- 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76355 and previous config saved to /var/cache/conftool/dbconfig/20250521-090409-root.json
- 09:03 XioNoX: cr2-eqdfw# set protocols bgp graceful-shutdown sender - T364092
- 09:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
- 09:02 mvernon@cumin1002: START - Cookbook sre.dns.netbox
- 08:59 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
- 08:54 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
- 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76354 and previous config saved to /var/cache/conftool/dbconfig/20250521-084904-root.json
- 08:48 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts thanos-fe[1001-1003].eqiad.wmnet
- 08:47 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on P{thanos-fe100[4-7]*} or P{thanos-fe2*} and (A:thanos-fe or A:thanos-fe-codfw or A:thanos-fe-eqiad)
- 08:43 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on P{thanos-fe100[4-7]*} or P{thanos-fe2*} and (A:thanos-fe or A:thanos-fe-codfw or A:thanos-fe-eqiad)
- 08:42 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bookworm
- 08:38 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1017.eqiad.wmnet with reason: host reimage
- 08:35 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1017.eqiad.wmnet with reason: host reimage
- 08:34 elukey@cumin1002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host ml-serve1001.eqiad.wmnet
- 08:34 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve1001.eqiad.wmnet
- 08:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76353 and previous config saved to /var/cache/conftool/dbconfig/20250521-083358-root.json
- 08:29 Emperor: disable puppet on thanos-fe1001 and thanos-fe1004 T391352
- 08:26 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
- 08:23 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.2 refs T392172
- 08:19 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 08:19 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
- 08:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76352 and previous config saved to /var/cache/conftool/dbconfig/20250521-081851-root.json
- 08:10 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1016.eqiad.wmnet with reason: host reimage
- 08:06 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1016.eqiad.wmnet with reason: host reimage
- 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 08:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76351 and previous config saved to /var/cache/conftool/dbconfig/20250521-080346-root.json
- 07:56 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.codfw.wmnet
- 07:50 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
- 07:50 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.codfw.wmnet
- 07:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76350 and previous config saved to /var/cache/conftool/dbconfig/20250521-074841-root.json
- 07:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 07:34 marostegui: Move s5 codfw to SBR T383795
- 07:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76349 and previous config saved to /var/cache/conftool/dbconfig/20250521-073336-root.json
- 07:32 marostegui: Install 10.6.22 on db1187 T394623
- 07:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1187 T394623', diff saved to https://phabricator.wikimedia.org/P76348 and previous config saved to /var/cache/conftool/dbconfig/20250521-073207-marostegui.json
- 07:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
- 07:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
- 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Increase weight for db1169', diff saved to https://phabricator.wikimedia.org/P76347 and previous config saved to /var/cache/conftool/dbconfig/20250521-070156-marostegui.json
- 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Increase weight for db1169', diff saved to https://phabricator.wikimedia.org/P76346 and previous config saved to /var/cache/conftool/dbconfig/20250521-065618-marostegui.json
- 06:55 XioNoX: push pfw policies - T394728
- 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'Increase weight for db1169', diff saved to https://phabricator.wikimedia.org/P76345 and previous config saved to /var/cache/conftool/dbconfig/20250521-064444-marostegui.json
- 06:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 22616
- 05:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 22616
- 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Increase weight for db1169', diff saved to https://phabricator.wikimedia.org/P76344 and previous config saved to /var/cache/conftool/dbconfig/20250521-053116-marostegui.json
- 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db1169 with 10%', diff saved to https://phabricator.wikimedia.org/P76343 and previous config saved to /var/cache/conftool/dbconfig/20250521-052258-marostegui.json
- 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P76342 and previous config saved to /var/cache/conftool/dbconfig/20250521-050730-marostegui.json
- 05:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 04:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
- 04:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Maintenance
- 04:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1019.eqiad.wmnet with reason: Maintenance
- 04:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1018.eqiad.wmnet with reason: Maintenance
- 04:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1015.eqiad.wmnet with reason: Maintenance
- 04:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1014.eqiad.wmnet with reason: Maintenance
- 01:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1039.eqiad.wmnet
- 01:08 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 01:05 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 01:02 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1038.eqiad.wmnet
- 01:02 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 01:02 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1038.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 01:01 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1038.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 00:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1039.eqiad.wmnet
- 00:58 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 00:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1037.eqiad.wmnet
- 00:57 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:57 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1037.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 00:56 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1037.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 00:52 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1038.eqiad.wmnet
- 00:52 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 00:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1036.eqiad.wmnet
- 00:51 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:51 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1036.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 00:50 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1036.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 00:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1037.eqiad.wmnet
- 00:46 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 00:41 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt1034.eqiad.wmnet
- 00:41 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:41 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1036.eqiad.wmnet
- 00:38 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 00:36 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1033.eqiad.wmnet
- 00:36 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:33 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 00:33 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1034.eqiad.wmnet
- 00:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1035.eqiad.wmnet
- 00:33 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:30 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt1034.eqiad.wmnet
- 00:30 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:30 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1034.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 00:30 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 00:30 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1034.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 00:25 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 00:22 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1035.eqiad.wmnet
- 00:21 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1034.eqiad.wmnet
- 00:20 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1033.eqiad.wmnet
- 00:16 rzl@deploy1003: Finished scap sync-world: 1147918 (duration: 03m 27s)
- 00:14 rzl@deploy1003: Started scap sync-world: 1147918
2025-05-20
- 22:47 dzahn@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host zuul1003.eqiad.wmnet
- 22:47 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul1003.eqiad.wmnet with OS bookworm
- 22:32 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1003.eqiad.wmnet with reason: host reimage
- 22:27 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul1003.eqiad.wmnet with reason: host reimage
- 22:15 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host zuul1003.eqiad.wmnet with OS bookworm
- 22:08 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul1003.eqiad.wmnet - dzahn@cumin1002"
- 22:08 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul1003.eqiad.wmnet - dzahn@cumin1002"
- 22:07 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zuul1003.eqiad.wmnet on all recursors
- 22:07 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache zuul1003.eqiad.wmnet on all recursors
- 22:07 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:07 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul1003.eqiad.wmnet - dzahn@cumin1002"
- 22:06 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch1057.eqiad.wmnet with OS bullseye
- 22:05 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul1003.eqiad.wmnet - dzahn@cumin1002"
- 22:00 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 22:00 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host zuul1003.eqiad.wmnet
- 21:58 dzahn@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host zuul2003.codfw.wmnet
- 21:58 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul2003.codfw.wmnet with OS bookworm
- 21:41 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2003.codfw.wmnet with reason: host reimage
- 21:38 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul2003.codfw.wmnet with reason: host reimage
- 21:21 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host zuul2003.codfw.wmnet with OS bookworm
- 21:20 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul2003.codfw.wmnet - dzahn@cumin1002"
- 21:20 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul2003.codfw.wmnet - dzahn@cumin1002"
- 21:19 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zuul2003.codfw.wmnet on all recursors
- 21:19 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache zuul2003.codfw.wmnet on all recursors
- 21:19 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:19 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul2003.codfw.wmnet - dzahn@cumin1002"
- 21:19 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul2003.codfw.wmnet - dzahn@cumin1002"
- 21:16 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 21:16 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host zuul2003.codfw.wmnet
- 21:06 jforrester@deploy1003: Finished scap sync-world: Backport for TransformHandler: Return 400 for invalid titles (T394270), Merge remote-tracking branch 'origin/master' into wmf_deploy (T341775 T373017 T393122 T394404), Xml::input, label: Replace usage with Html::input, label (T394025) (duration: 11m 28s)
- 21:05 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt1031.eqiad.wmnet
- 21:05 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:05 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1031.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 21:05 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1031.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 21:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1057
- 21:00 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1057
- 21:00 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1057.eqiad.wmnet with OS bullseye
- 21:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch1057.eqiad.wmnet with OS bullseye
- 20:59 jforrester@deploy1003: mszabo, jforrester: Continuing with sync
- 20:57 jforrester@deploy1003: mszabo, jforrester: Backport for TransformHandler: Return 400 for invalid titles (T394270), Merge remote-tracking branch 'origin/master' into wmf_deploy (T341775 T373017 T393122 T394404), Xml::input, label: Replace usage with Html::input, label (T394025) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes c
- 20:55 jforrester@deploy1003: Started scap sync-world: Backport for TransformHandler: Return 400 for invalid titles (T394270), Merge remote-tracking branch 'origin/master' into wmf_deploy (T341775 T373017 T393122 T394404), Xml::input, label: Replace usage with Html::input, label (T394025)
- 20:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1057
- 20:45 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1057
- 20:45 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1057.eqiad.wmnet with OS bullseye
- 20:45 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 20:40 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1031.eqiad.wmnet
- 20:40 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1032.eqiad.wmnet
- 20:40 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:40 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1032.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 20:40 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1032.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 20:37 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host zuul1003.eqiad.wmnet
- 20:37 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 20:37 dzahn@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host zuul2002.codfw.wmnet
- 20:37 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul2002.codfw.wmnet with OS bookworm
- 20:37 jforrester@deploy1003: Finished scap sync-world: Backport for Add the ReadingLists beta feature to the allow list (T392008), Enable empty search recommendations on beta cluster and testwiki (duration: 13m 12s)
- 20:36 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 20:36 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cirrussearch1057.eqiad.wmnet with OS bullseye
- 20:31 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 20:31 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 20:30 jforrester@deploy1003: jforrester, bwang: Continuing with sync
- 20:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1057
- 20:28 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1057
- 20:28 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1057.eqiad.wmnet with OS bullseye
- 20:26 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 20:26 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 20:26 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1057.eqiad.wmnet with OS bullseye
- 20:26 jforrester@deploy1003: jforrester, bwang: Backport for Add the ReadingLists beta feature to the allow list (T392008), Enable empty search recommendations on beta cluster and testwiki synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:25 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1032.eqiad.wmnet
- 20:24 jforrester@deploy1003: Started scap sync-world: Backport for Add the ReadingLists beta feature to the allow list (T392008), Enable empty search recommendations on beta cluster and testwiki
- 20:23 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 20:23 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt1031.eqiad.wmnet
- 20:23 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:23 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1031.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 20:23 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt1031.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 20:21 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2002.codfw.wmnet with reason: host reimage
- 20:21 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 20:19 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 20:18 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul2002.codfw.wmnet with reason: host reimage
- 20:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1057
- 20:17 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1057
- 20:17 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1057.eqiad.wmnet with OS bullseye
- 20:17 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1057.eqiad.wmnet on all recursors
- 20:17 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1057.eqiad.wmnet on all recursors
- 20:15 jforrester@deploy1003: Finished scap sync-world: Backport for Add zh, en, and meta to zh_arbcom import sources (T394505), Enable ReadingList beta feature on test.wikipedia.org (T392008) (duration: 12m 43s)
- 20:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1057 to cirrussearch1057
- 20:14 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 20:14 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1057
- 20:13 dzahn@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 20:13 dzahn@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul1003.eqiad.wmnet - dzahn@cumin1002"
- 20:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc2018.codfw.wmnet with OS bookworm
- 20:12 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1057
- 20:12 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1057 on all recursors
- 20:12 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1057 on all recursors
- 20:12 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:12 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1057 to cirrussearch1057 - bking@cumin2002"
- 20:12 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1057 to cirrussearch1057 - bking@cumin2002"
- 20:08 jforrester@deploy1003: zhaofjx, jdlrobson, jforrester: Continuing with sync
- 20:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt1031.eqiad.wmnet
- 20:06 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul1003.eqiad.wmnet - dzahn@cumin1002"
- 20:06 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:06 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1057 to cirrussearch1057
- 20:04 jforrester@deploy1003: zhaofjx, jdlrobson, jforrester: Backport for Add zh, en, and meta to zh_arbcom import sources (T394505), Enable ReadingList beta feature on test.wikipedia.org (T392008) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:02 jforrester@deploy1003: Started scap sync-world: Backport for Add zh, en, and meta to zh_arbcom import sources (T394505), Enable ReadingList beta feature on test.wikipedia.org (T392008)
- 20:01 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host zuul2002.codfw.wmnet with OS bookworm
- 20:01 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 20:01 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host zuul1003.eqiad.wmnet
- 19:52 dzahn@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host zuul1002.eqiad.wmnet
- 19:52 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul1002.eqiad.wmnet with OS bookworm
- 19:49 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul2002.codfw.wmnet - dzahn@cumin1002"
- 19:49 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul2002.codfw.wmnet - dzahn@cumin1002"
- 19:49 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zuul2002.codfw.wmnet on all recursors
- 19:49 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache zuul2002.codfw.wmnet on all recursors
- 19:49 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:49 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul2002.codfw.wmnet - dzahn@cumin1002"
- 19:46 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul2002.codfw.wmnet - dzahn@cumin1002"
- 19:40 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 19:40 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host zuul2002.codfw.wmnet
- 19:40 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: T383173
- 19:36 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1002.eqiad.wmnet with reason: host reimage
- 19:33 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul1002.eqiad.wmnet with reason: host reimage
- 19:20 aokoth@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doc1004.eqiad.wmnet
- 19:20 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doc1004.eqiad.wmnet with OS bookworm
- 19:18 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host zuul1002.eqiad.wmnet with OS bookworm
- 19:17 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul1002.eqiad.wmnet - dzahn@cumin1002"
- 19:17 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul1002.eqiad.wmnet - dzahn@cumin1002"
- 19:16 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zuul1002.eqiad.wmnet on all recursors
- 19:16 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache zuul1002.eqiad.wmnet on all recursors
- 19:16 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:16 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul1002.eqiad.wmnet - dzahn@cumin1002"
- 19:15 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul1002.eqiad.wmnet - dzahn@cumin1002"
- 19:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1088.eqiad.wmnet with OS bullseye
- 19:05 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doc1004.eqiad.wmnet with reason: host reimage
- 19:03 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 19:03 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host zuul1002.eqiad.wmnet
- 19:02 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on doc1004.eqiad.wmnet with reason: host reimage
- 18:52 aokoth@cumin1002: START - Cookbook sre.hosts.reimage for host doc1004.eqiad.wmnet with OS bookworm
- 18:50 aokoth@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doc1004.eqiad.wmnet - aokoth@cumin1002"
- 18:50 aokoth@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doc1004.eqiad.wmnet - aokoth@cumin1002"
- 18:50 aokoth@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doc1004.eqiad.wmnet on all recursors
- 18:50 aokoth@cumin1002: START - Cookbook sre.dns.wipe-cache doc1004.eqiad.wmnet on all recursors
- 18:50 aokoth@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:50 aokoth@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1004.eqiad.wmnet - aokoth@cumin1002"
- 18:49 aokoth@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doc1004.eqiad.wmnet - aokoth@cumin1002"
- 18:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1088.eqiad.wmnet with reason: host reimage
- 18:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2018.codfw.wmnet with OS bookworm
- 18:46 aokoth@cumin1002: START - Cookbook sre.dns.netbox
- 18:46 aokoth@cumin1002: START - Cookbook sre.ganeti.makevm for new host doc1004.eqiad.wmnet
- 18:45 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1088.eqiad.wmnet with reason: host reimage
- 18:30 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1088
- 18:30 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1088
- 18:30 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1088.eqiad.wmnet with OS bullseye
- 18:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1088 to cirrussearch1088
- 18:29 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1088
- 18:25 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1088
- 18:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1088 on all recursors
- 18:25 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1088 on all recursors
- 18:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1088 to cirrussearch1088 - bking@cumin2002"
- 18:21 topranks: repool codfw in dns after core router maintenance T393552
- 18:21 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site codfw [reason: repool codfw after core router maintenance, T393552]
- 18:21 cmooney@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site codfw [reason: repool codfw after core router maintenance, T393552]
- 18:17 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1088 to cirrussearch1088 - bking@cumin2002"
- 18:14 dzahn@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host zuul1001.eqiad.wmnet
- 18:14 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul1001.eqiad.wmnet with OS bookworm
- 18:12 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Upgrading to Java 11.0.27 - eevans@cumin1002
- 18:09 denisse@deploy1003: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 25.5.0 - T394750 (duration: 00m 17s)
- 18:09 denisse@deploy1003: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 25.5.0 - T394750
- 17:56 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1001.eqiad.wmnet with reason: host reimage
- 17:52 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:52 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul1001.eqiad.wmnet with reason: host reimage
- 17:52 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1088 to cirrussearch1088
- 17:50 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cirrussearch1087.eqiad.wmnet with reason: eqiad is depooled, noisy alerts
- 17:45 topranks: moving links from old to new linecard cr2-codfw slot 1 to slot 0 T393552
- 17:41 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host zuul1001.eqiad.wmnet with OS bookworm
- 17:33 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul1001.eqiad.wmnet - dzahn@cumin1002"
- 17:33 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul1001.eqiad.wmnet - dzahn@cumin1002"
- 17:32 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zuul1001.eqiad.wmnet on all recursors
- 17:32 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache zuul1001.eqiad.wmnet on all recursors
- 17:32 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:32 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul1001.eqiad.wmnet - dzahn@cumin1002"
- 17:32 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul1001.eqiad.wmnet - dzahn@cumin1002"
- 17:31 vgutierrez@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
- 17:31 vgutierrez: repool cp4037 with edge uniques enabled, stats available on https://grafana.wikimedia.org/goto/fYSIMlaHR?orgId=1 - T391411
- 17:29 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 17:29 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host zuul1001.eqiad.wmnet
- 17:27 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul2001.codfw.wmnet with OS bullseye
- 17:21 topranks: enable FPC 0 (10x100G) card in cr2-codfw (T393552)
- 17:18 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 17:18 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 17:17 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:16 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 17:13 swfrench@deploy1003: Stopping before sync operations
- 17:13 swfrench@deploy1003: Started scap sync-world: Non-deploy scap run to switch mw-debug/pinkunicorn to PHP 8.1 - T391057
- 17:11 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2001.codfw.wmnet with reason: host reimage
- 17:11 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 11 hosts with reason: replace cr2-codfw switch control boards and install new line card
- 17:08 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul2001.codfw.wmnet with reason: host reimage
- 16:53 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host zuul2001.codfw.wmnet with OS bullseye
- 16:46 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Upgrading to Java 11.0.27 - eevans@cumin1002
- 16:43 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Upgrading to Java 11.0.27 - eevans@cumin1002
- 16:06 klausman@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1002
- 15:58 vgutierrez@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
- 15:57 vgutierrez: depooling cp4037 before enabling edge uniques - T391411
- 15:54 vgutierrez@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
- 15:52 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:51 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:49 vgutierrez@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4038.ulsfo.wmnet
- 15:49 klausman@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1002
- 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=0) rolling restart_daemons on A:logstash-collector
- 15:15 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Upgrading to Java 11.0.27 - eevans@cumin1002
- 15:13 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
- 15:09 klausman@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1002
- 15:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 15:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:58 brouberol@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on kafka-jumbo[1016-1018].eqiad.wmnet with reason: Parted config is broken causing the hosts to have no data disk
- 14:56 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:54 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 65 hosts with reason: eqiad is depooled, noisy alerts
- 14:53 topranks: shutting down control board 1 on cr2-codfw (T393552)
- 14:52 topranks: shutting down backup RE1 on cr2-codfw (T393552)
- 14:51 klausman@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1002
- 14:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1087.eqiad.wmnet with OS bullseye
- 14:48 moritzm: installing expat security updates
- 14:39 topranks: switching active routing-engine to RE0 on cr2-codfw (this will cause protocol adjacencies to flap) (T364092)
- 14:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1087.eqiad.wmnet with reason: host reimage
- 14:21 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1087.eqiad.wmnet with reason: host reimage
- 14:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1083.eqiad.wmnet with OS bullseye
- 14:18 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1082.eqiad.wmnet with OS bullseye
- 14:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1087
- 14:06 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1087
- 14:06 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1087.eqiad.wmnet with OS bullseye
- 14:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1087 to cirrussearch1087
- 14:03 topranks: switching active routing-engine to RE1 on cr2-codfw (this will cause protocol adjacencies to flap) (T364092)
- 14:03 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1087
- 14:02 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1087
- 14:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1087 on all recursors
- 14:02 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1087 on all recursors
- 14:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:02 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1087 to cirrussearch1087 - bking@cumin2002"
- 14:01 phuedx@deploy1003: Finished scap sync-world: Backport for ext-EventStreamConfig: Update product_metrics.web_base stream (T394457) (duration: 14m 14s)
- 14:01 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet,db1216.eqiad.wmnet with reason: Move s8 to s3
- 13:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1083.eqiad.wmnet with reason: host reimage
- 13:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1087 to cirrussearch1087 - bking@cumin2002"
- 13:56 topranks: rebooting backup routing-engine RE1 on cr2-codfw to install JunOS upgrade (T364092)
- 13:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1082.eqiad.wmnet with reason: host reimage
- 13:54 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:54 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1087 to cirrussearch1087
- 13:54 phuedx@deploy1003: phuedx: Continuing with sync
- 13:53 phuedx@deploy1003: phuedx: Backport for ext-EventStreamConfig: Update product_metrics.web_base stream (T394457) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:52 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1083.eqiad.wmnet with reason: host reimage
- 13:52 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1082.eqiad.wmnet with reason: host reimage
- 13:47 phuedx@deploy1003: Started scap sync-world: Backport for ext-EventStreamConfig: Update product_metrics.web_base stream (T394457)
- 13:40 topranks: disabling bgp groups on cr2-codfw ahead of upgrade/line-card install (T364092)
- 13:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1083
- 13:38 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1083
- 13:38 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1083.eqiad.wmnet with OS bullseye
- 13:38 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 11 hosts with reason: replace cr2-codfw switch control boards and install new line card
- 13:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1082
- 13:37 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1082
- 13:37 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1082.eqiad.wmnet with OS bullseye
- 13:37 hashar@deploy1003: Finished scap sync-world: Backport for IP cap lift request for Leeds University 21 May (T394639), core-Namespaces: Update Malay wiki (mswiki) namespaces (T394603) (duration: 15m 28s)
- 13:32 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1083 to cirrussearch1083
- 13:32 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1083
- 13:30 hashar@deploy1003: hashar, bunnypranav, anzx: Continuing with sync
- 13:29 topranks: drain transport circuits landing on cr2-codfw of traffic before router upgrade (T364092)
- 13:28 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1083
- 13:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1083 on all recursors
- 13:28 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1083 on all recursors
- 13:28 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:28 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1083 to cirrussearch1083 - bking@cumin2002"
- 13:28 hashar@deploy1003: hashar, bunnypranav, anzx: Backport for IP cap lift request for Leeds University 21 May (T394639), core-Namespaces: Update Malay wiki (mswiki) namespaces (T394603) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1082 to cirrussearch1082
- 13:26 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1082
- 13:26 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1083 to cirrussearch1083 - bking@cumin2002"
- 13:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1082
- 13:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1082 on all recursors
- 13:22 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1082 on all recursors
- 13:22 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:22 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1082 to cirrussearch1082 - bking@cumin2002"
- 13:22 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:22 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1247* gradually with 4 steps - Pooling in after cloning
- 13:21 hashar@deploy1003: Started scap sync-world: Backport for IP cap lift request for Leeds University 21 May (T394639), core-Namespaces: Update Malay wiki (mswiki) namespaces (T394603)
- 13:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1082 to cirrussearch1082 - bking@cumin2002"
- 13:20 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1083 to cirrussearch1083
- 13:17 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:17 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1082 to cirrussearch1082
- 13:15 topranks: re-enable graceful switchover on cr1-codfw (T364092)
- 13:13 hashar: Restarted release Jenkins on releases1003
- 13:05 topranks: switching active routing-engine to RE0 on cr1-codfw (this will cause protocol adjacencies to flap) (T364092)
- 13:03 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on P{dns7001*} and A:dnsbox
- 13:01 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on P{dns7001*} and A:dnsbox
- 12:59 sukhe: update dns-root-data to 2024071801~deb12u1 on dns7001
- 12:58 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 22616
- 12:57 topranks: rebooting backup routing-engine RE0 on cr1-codfw to install JunOS upgrade (T364092)
- 12:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 22616
- 12:53 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 12:52 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cr2-codfw with reason: upgrade cr1-codfw JunOS
- 12:51 ayounsi@cumin1002: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 3856
- 12:51 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 3856
- 12:51 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 12:49 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 12:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 42
- 12:48 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 12:46 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 42
- 12:45 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3856
- 12:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 3856
- 12:35 mszabo@deploy1003: Finished scap sync-world: Backport for DeduplicateStyles: Only transform possible style nodes (T394059) (duration: 17m 42s)
- 12:32 topranks: switching active routing-engine to RE1 on cr1-codfw (this will cause protocol adjacencies to flap) (T364092)
- 12:29 moritzm: installing systemd bugfix updates from Bookworm point release
- 12:28 mszabo@deploy1003: mszabo: Continuing with sync
- 12:24 mszabo@deploy1003: mszabo: Backport for DeduplicateStyles: Only transform possible style nodes (T394059) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:23 topranks: rebooting backup routing-engine RE1 on cr1-codfw to install JunOS upgrade (T364092)
- 12:17 mszabo@deploy1003: Started scap sync-world: Backport for DeduplicateStyles: Only transform possible style nodes (T394059)
- 12:17 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 12:12 Amir1: creating existencelinks on all wikis (T394617)
- 12:08 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1247* gradually with 4 steps - Pooling in after cloning
- 12:08 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 12:05 topranks: disable routing-engine sync / graceful-switchover on cr1-codfw ahead of JunOS upgrade on RE1 T364092
- 11:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
- 11:53 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
- 11:49 topranks: apply bgp "graceful shutdown" community on cr1-codfw ahead of JunOS upgrade (T364092)
- 11:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 11:41 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 11:41 topranks: drain transport circuits landing on cr1-codfw of traffic before router upgrade (T364092)
- 11:39 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 10 hosts with reason: upgrade cr1-codfw JunOS
- 11:37 taavi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:37 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: allocate wiki replica VIPs for x3 - taavi@cumin1002"
- 11:37 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: allocate wiki replica VIPs for x3 - taavi@cumin1002"
- 11:37 cmooney@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on 10 hosts with reason: upgrade cr1-codfw JunOS
- 11:33 taavi@cumin1002: START - Cookbook sre.dns.netbox
- 11:33 taavi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 11:32 cgoubert@deploy1003: Finished scap sync-world: mwscript-mwcron: Add some logging (duration: 02m 32s)
- 11:30 taavi@cumin1002: START - Cookbook sre.dns.netbox
- 11:29 cgoubert@deploy1003: Started scap sync-world: mwscript-mwcron: Add some logging
- 11:25 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site codfw [reason: being cautious during maintenance on codfw CRs, T393552]
- 11:23 topranks: depool codfw in dns T393552
- 11:22 cmooney@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site codfw [reason: being cautious during maintenance on codfw CRs, T393552]
- 11:22 moritzm: installing openjdk-11 security updates
- 11:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for OATHAuth: Mark checkuser and suppress as requiring 2FA (T150898 T389727) (duration: 13m 57s)
- 11:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:13 ladsgroup@deploy1003: ladsgroup, sbassett: Continuing with sync
- 11:12 ladsgroup@deploy1003: ladsgroup, sbassett: Backport for OATHAuth: Mark checkuser and suppress as requiring 2FA (T150898 T389727) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:09 btullis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
- 11:08 moritzm: upgrading cassandra-dev to latest Java 11 security updates
- 11:08 btullis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
- 11:07 btullis@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 11:07 btullis@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 11:06 ladsgroup@deploy1003: Started scap sync-world: Backport for OATHAuth: Mark checkuser and suppress as requiring 2FA (T150898 T389727)
- 11:05 btullis@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
- 11:04 btullis@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
- 11:04 btullis@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 11:03 btullis@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 11:03 btullis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: sync
- 11:02 btullis@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: sync
- 11:02 btullis@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: sync
- 11:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1258 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76334 and previous config saved to /var/cache/conftool/dbconfig/20250520-110214-ladsgroup.json
- 11:01 btullis@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: sync
- 10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db2167 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76333 and previous config saved to /var/cache/conftool/dbconfig/20250520-105937-ladsgroup.json
- 10:57 btullis@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 10:57 btullis@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 10:25 taavi@deploy1003: Finished scap sync-world: Backport for Merge branch 'master' into wmf_deploy (duration: 20m 17s)
- 10:25 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 10:18 taavi@deploy1003: gjg, taavi: Continuing with sync
- 10:14 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 10:13 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
- 10:13 elukey@deploy1003: helmfile [eqiad] START helmfile.d/admin 'sync'.
- 10:12 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1247.eqiad.wmnet
- 10:12 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for db1247.eqiad.wmnet
- 10:12 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'sync'.
- 10:11 taavi@deploy1003: gjg, taavi: Backport for Merge branch 'master' into wmf_deploy synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 10:11 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'sync'.
- 10:08 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 10:04 taavi@deploy1003: Started scap sync-world: Backport for Merge branch 'master' into wmf_deploy
- 09:58 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 09:55 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 09:51 hashar@deploy1003: Finished deploy [gerrit/gerrit@2ecc180]: wm-zuul-status: reset current checks - T394485 (duration: 00m 11s)
- 09:51 hashar@deploy1003: Started deploy [gerrit/gerrit@2ecc180]: wm-zuul-status: reset current checks - T394485
- 09:44 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 09:39 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:39 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 09:38 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:37 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 09:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 09:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 09:33 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
- 09:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 09:32 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 09:31 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1011.eqiad.wmnet
- 09:31 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 09:31 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 09:30 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1010.eqiad.wmnet
- 09:25 brouberol@cumin2002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1011.eqiad.wmnet
- 09:25 brouberol@cumin2002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1010.eqiad.wmnet
- 09:13 elukey@deploy1003: helmfile [codfw] START helmfile.d/services/kartotherian: sync
- 09:02 elukey@deploy1003: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
- 09:02 elukey@deploy1003: helmfile [staging] START helmfile.d/services/kartotherian: sync
- 08:49 XioNoX: restart gnmic in codfw - T388641
- 08:38 XioNoX: restart gnmic in eqsin - T388641
- 08:36 XioNoX: restart gnmic in esams - T388641
- 08:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 08:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 08:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 08:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 08:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 08:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 08:30 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.2 refs T392172
- 07:54 fabfur: removing varnishkafka related alerts from prometheus (https://gerrit.wikimedia.org/r/c/operations/alerts/+/1146516) (T393772)
- 07:25 fabfur: disabling varnishkafka (webrequest) on A:cp (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1147783) (T393772)
- 07:23 slyngshede@dns1004: END - running authdns-update
- 07:23 slyngshede@dns1004: START - running authdns-update
- 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76332 and previous config saved to /var/cache/conftool/dbconfig/20250520-065249-root.json
- 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76331 and previous config saved to /var/cache/conftool/dbconfig/20250520-063743-root.json
- 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76330 and previous config saved to /var/cache/conftool/dbconfig/20250520-062237-root.json
- 06:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76329 and previous config saved to /var/cache/conftool/dbconfig/20250520-060731-root.json
- 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76328 and previous config saved to /var/cache/conftool/dbconfig/20250520-060313-root.json
- 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76327 and previous config saved to /var/cache/conftool/dbconfig/20250520-055225-root.json
- 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76326 and previous config saved to /var/cache/conftool/dbconfig/20250520-054807-root.json
- 05:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76325 and previous config saved to /var/cache/conftool/dbconfig/20250520-053720-root.json
- 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76324 and previous config saved to /var/cache/conftool/dbconfig/20250520-053302-root.json
- 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76323 and previous config saved to /var/cache/conftool/dbconfig/20250520-052215-root.json
- 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76322 and previous config saved to /var/cache/conftool/dbconfig/20250520-051756-root.json
- 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2236 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76321 and previous config saved to /var/cache/conftool/dbconfig/20250520-050710-root.json
- 05:03 marostegui: Install 10.11.13 on db2236 T394653
- 05:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2236.codfw.wmnet with reason: Maintenance
- 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76320 and previous config saved to /var/cache/conftool/dbconfig/20250520-050250-root.json
- 05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2236 T394653', diff saved to https://phabricator.wikimedia.org/P76319 and previous config saved to /var/cache/conftool/dbconfig/20250520-050017-marostegui.json
- 04:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Maintenance
- 04:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1018.eqiad.wmnet with reason: Maintenance
- 04:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1019.eqiad.wmnet with reason: Maintenance
- 04:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1015.eqiad.wmnet with reason: Maintenance
- 04:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1014.eqiad.wmnet with reason: Maintenance
- 04:51 marostegui: Stop mariadb on db1155, wiki replicas will show lag on: s2, s4, s6 and s7 T394624
- 04:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
- 04:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76318 and previous config saved to /var/cache/conftool/dbconfig/20250520-044744-root.json
- 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.27 (duration: 01m 33s)
- 03:53 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.2 refs T392172 (duration: 50m 50s)
- 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.2 refs T392172
- 00:54 rzl@deploy1003: Stopping before sync operations
- 00:54 rzl@deploy1003: Started scap sync-world: 1147901
2025-05-19
- 23:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1081.eqiad.wmnet with OS bullseye
- 22:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2007.codfw.wmnet with OS bullseye
- 22:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 22:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1081.eqiad.wmnet with reason: host reimage
- 22:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1080.eqiad.wmnet with OS bullseye
- 22:26 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on relforge[1003-1004,1008-1010].eqiad.wmnet with reason: decom in progress
- 22:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1081.eqiad.wmnet with reason: host reimage
- 22:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1081
- 22:11 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1081
- 22:11 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1081.eqiad.wmnet with OS bullseye
- 22:08 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts relforge[1003-1004].eqiad.wmnet
- 22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1255 and db2241 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76317 and previous config saved to /var/cache/conftool/dbconfig/20250519-220432-ladsgroup.json
- 22:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1080.eqiad.wmnet with reason: host reimage
- 22:02 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_eqiad - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f1ff9224b80>>
- 22:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1209 and db2195 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76316 and previous config saved to /var/cache/conftool/dbconfig/20250519-220201-ladsgroup.json
- 22:01 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts relforge[1003-1004].eqiad.wmnet
- 22:01 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 22:00 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1080.eqiad.wmnet with reason: host reimage
- 22:00 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 21:58 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_eqiad - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f9083afdf10>>
- 21:58 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1072.eqiad.wmnet with reason: host reimage
- 21:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1072.eqiad.wmnet with reason: host reimage
- 21:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2006.codfw.wmnet with OS bullseye
- 21:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2005.codfw.wmnet with OS bookworm
- 21:43 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 21:43 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 21:42 sbassett: Deployed security fix for T394396
- 21:36 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 21:35 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 21:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1081 to cirrussearch1081
- 21:26 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1081
- 21:25 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1081
- 21:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1081 on all recursors
- 21:25 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1081 on all recursors
- 21:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1081 to cirrussearch1081 - bking@cumin2002"
- 21:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1081 to cirrussearch1081 - bking@cumin2002"
- 21:21 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:21 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1081 to cirrussearch1081
- 21:20 sbassett: Deployed security fixes for T394692, T394693 and T394700
- 21:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1080
- 21:07 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1080
- 21:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1080.eqiad.wmnet with OS bullseye
- 21:06 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 21:06 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 21:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1080 to cirrussearch1080
- 21:04 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1080
- 21:03 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1080
- 21:03 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1080 on all recursors
- 21:03 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1080 on all recursors
- 21:03 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:03 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1080 to cirrussearch1080 - bking@cumin2002"
- 21:02 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1080 to cirrussearch1080 - bking@cumin2002"
- 20:58 tgr: late UTC deploys done
- 20:57 tgr@deploy1003: Finished scap sync-world: Backport for [noop] Set $wgCentralAuthRestrictSharedDomain (T391270) (duration: 13m 24s)
- 20:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1085.eqiad.wmnet with OS bullseye
- 20:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1086.eqiad.wmnet with OS bullseye
- 20:50 tgr@deploy1003: tgr: Continuing with sync
- 20:49 tgr@deploy1003: tgr: Backport for [noop] Set $wgCentralAuthRestrictSharedDomain (T391270) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:48 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:48 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1080 to cirrussearch1080
- 20:44 tgr@deploy1003: Started scap sync-world: Backport for [noop] Set $wgCentralAuthRestrictSharedDomain (T391270)
- 20:44 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1059.eqiad.wmnet with OS bullseye
- 20:36 dzahn@deploy1003: Finished scap sync-world: Backport for Design Research survey: Increase coverage (T394315) (duration: 13m 33s)
- 20:32 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1059
- 20:32 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1059
- 20:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1059.eqiad.wmnet with OS bullseye
- 20:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1085.eqiad.wmnet with reason: host reimage
- 20:29 dzahn@deploy1003: dani, dzahn: Continuing with sync
- 20:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1086.eqiad.wmnet with reason: host reimage
- 20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1059 to cirrussearch1059
- 20:27 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1085.eqiad.wmnet with reason: host reimage
- 20:27 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1059
- 20:27 dzahn@deploy1003: dani, dzahn: Backport for Design Research survey: Increase coverage (T394315) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:26 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1059
- 20:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1059 on all recursors
- 20:25 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1059 on all recursors
- 20:25 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1059 to cirrussearch1059 - bking@cumin2002"
- 20:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1059 to cirrussearch1059 - bking@cumin2002"
- 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2005.codfw.wmnet with OS bookworm
- 20:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1086.eqiad.wmnet with reason: host reimage
- 20:24 dzahn@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host zuul2001.codfw.wmnet
- 20:24 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host zuul2001.codfw.wmnet with OS bookworm
- 20:23 denisse: Downgrade rsyslog, rsyslog-kafka, and rsyslog-openssl to `8.2302.0-1+deb12u1_amd64` - T383309
- 20:22 dzahn@deploy1003: Started scap sync-world: Backport for Design Research survey: Increase coverage (T394315)
- 20:21 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:21 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1059 to cirrussearch1059
- 20:20 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1058.eqiad.wmnet with OS bullseye
- 20:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1086
- 20:09 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1086
- 20:09 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1086.eqiad.wmnet with OS bullseye
- 20:07 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1085
- 20:07 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1085
- 20:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1085.eqiad.wmnet with OS bullseye
- 19:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 19:33 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host zuul2001.codfw.wmnet with OS bookworm
- 19:29 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 19:19 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1075.eqiad.wmnet with OS bookworm
- 19:18 Ammar: Ran fixStuckGlobalRename.php for T394699
- 19:16 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1073.eqiad.wmnet with OS bookworm
- 19:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1076.eqiad.wmnet with OS bookworm
- 19:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:08 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 19:01 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqiad - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f9083afdf10>>
- 19:01 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqiad - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f1ff9224b80>>
- 19:00 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 19:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_codfw - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7fb0aba54f70>>
- 18:58 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul2001.codfw.wmnet - dzahn@cumin1002"
- 18:58 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM zuul2001.codfw.wmnet - dzahn@cumin1002"
- 18:58 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) zuul2001.codfw.wmnet on all recursors
- 18:58 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache zuul2001.codfw.wmnet on all recursors
- 18:58 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:58 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul2001.codfw.wmnet - dzahn@cumin1002"
- 18:58 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM zuul2001.codfw.wmnet - dzahn@cumin1002"
- 18:56 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_codfw - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7ff9872e8580>>
- 18:54 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1075.eqiad.wmnet with reason: host reimage
- 18:50 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1073.eqiad.wmnet with reason: host reimage
- 18:46 dzahn@cumin1002: START - Cookbook sre.dns.netbox
- 18:46 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host zuul2001.codfw.wmnet
- 18:46 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1076.eqiad.wmnet with reason: host reimage
- 18:43 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1075.eqiad.wmnet with reason: host reimage
- 18:43 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1073.eqiad.wmnet with reason: host reimage
- 18:43 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1076.eqiad.wmnet with reason: host reimage
- 18:34 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 18:33 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 18:33 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 18:33 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 18:27 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1073.eqiad.wmnet with OS bookworm
- 18:27 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1075.eqiad.wmnet with OS bookworm
- 18:27 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1076.eqiad.wmnet with OS bookworm
- 18:24 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 18:15 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 17:58 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1011.eqiad.wmnet with reason: host reimage
- 17:55 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1011.eqiad.wmnet with reason: host reimage
- 17:48 wfan: payments-wiki upgraded from 01de91b7 to 7b484587
- 17:40 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 17:39 dzahn@deploy1003: Finished scap sync-world: Backport for remove throttling config for Istanbul Hackathon (T382309) (duration: 11m 34s)
- 17:36 brouberol@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 17:33 dzahn@deploy1003: dzahn: Continuing with sync
- 17:32 dzahn@deploy1003: dzahn: Backport for remove throttling config for Istanbul Hackathon (T382309) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:28 dzahn@deploy1003: Started scap sync-world: Backport for remove throttling config for Istanbul Hackathon (T382309)
- 17:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1058
- 17:24 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1058
- 17:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1058.eqiad.wmnet with OS bullseye
- 17:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1058 to cirrussearch1058
- 17:22 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1058
- 17:21 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:21 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:21 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1058
- 17:20 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1058 on all recursors
- 17:20 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1058 on all recursors
- 17:20 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:20 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1058 to cirrussearch1058 - bking@cumin2002"
- 17:19 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1058 to cirrussearch1058 - bking@cumin2002"
- 17:16 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:16 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1058 to cirrussearch1058
- 17:13 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 17:05 dwisehaupt@dns1004: END - running authdns-update
- 17:04 dwisehaupt@dns1004: START - running authdns-update
- 16:58 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1179.eqiad.wmnet onto db1183.eqiad.wmnet
- 16:58 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1179 gradually with 4 steps - Pool db1179.eqiad.wmnet in after cloning
- 16:54 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@d07b52d]: Deploy latest Airflow DAGs for the main instance. T392494. (duration: 00m 36s)
- 16:54 bvibber@deploy1003: Finished scap sync-world: Backport for Render Data:.chart page reviews in user language (T392725) (duration: 11m 54s)
- 16:53 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@d07b52d]: Deploy latest Airflow DAGs for the main instance. T392494.
- 16:47 bvibber@deploy1003: bvibber: Continuing with sync
- 16:46 bvibber@deploy1003: bvibber: Backport for Render Data:.chart page reviews in user language (T392725) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:42 bvibber@deploy1003: Started scap sync-world: Backport for Render Data:.chart page reviews in user language (T392725)
- 16:39 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:39 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: correcting cloudvirt1072-1076 - andrew@cumin1002"
- 16:39 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: correcting cloudvirt1072-1076 - andrew@cumin1002"
- 16:35 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 16:35 andrew@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 16:35 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 16:12 marostegui@cumin1002: START - Cookbook sre.mysql.pool db1179 gradually with 4 steps - Pool db1179.eqiad.wmnet in after cloning
- 16:03 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cirrussearch2110.codfw.wmnet with reason: firmware update
- 15:58 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_codfw - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7ff9872e8580>>
- 15:58 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_codfw - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7fb0aba54f70>>
- 15:53 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch1074.eqiad.wmnet|cirrussearch1075.eqiad.wmnet|cirrussearch1076.eqiad.wmnet|cirrussearch1077.eqiad.wmnet|cirrussearch1078.eqiad.wmnet|cirrussearch1079.eqiad.wmnet|cirrussearch1113.eqiad.wmnet|cirrussearch1114.eqiad.wmnet|cirrussearch1115.eqiad.wmnet|cirrussearch1116.eqiad.wmnet|cirrussearch1117.eqiad.wmnet
- 15:43 fabfur: uploading lua5.3-maxminddb deb package to apt repo (currently unused) (T394504)
- 15:37 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:37 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1069-1076 - andrew@cumin1002"
- 15:37 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1069-1076 - andrew@cumin1002"
- 15:36 dancy@deploy1003: Installation of scap version "4.169.1" completed for 2 hosts
- 15:34 dancy@deploy1003: Installing scap version "4.169.1" for 2 host(s)
- 15:33 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 15:33 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1076
- 15:33 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1076
- 15:33 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1075
- 15:33 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1075
- 15:33 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1074
- 15:33 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1074
- 15:33 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1073
- 15:33 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1073
- 15:33 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1072
- 15:32 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1072
- 15:32 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1071
- 15:32 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1071
- 15:32 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1070
- 15:32 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1070
- 15:32 andrew@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudvirt1070
- 15:31 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1070
- 15:31 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1069
- 15:31 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1069
- 15:31 andrew@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 15:30 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 15:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc2018']
- 15:20 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
- 15:19 moritzm: installing systemd bugfix updates from Bookworm point release
- 15:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2018']
- 15:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc2018.codfw.wmnet with OS bookworm
- 15:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest2005']
- 15:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest2005']
- 15:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest2003']
- 15:17 taavi@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 15:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest2003']
- 15:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:16 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1068 to cloud-private vlan - andrew@cumin1002"
- 15:16 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1068 to cloud-private vlan - andrew@cumin1002"
- 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-be2006']
- 15:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-be2006']
- 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-be2006']
- 15:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-be2006']
- 15:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:14 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:13 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 15:12 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1068
- 15:12 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1068
- 15:07 dancy@deploy1003: Finished scap sync-world: Updating images for T394389 (duration: 12m 55s)
- 14:59 dancy@deploy1003: dancy: Updating images for T394389 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:54 dancy@deploy1003: Started scap sync-world: Updating images for T394389
- 14:49 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:48 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:16 bking@cumin2002: conftool action : set/pooled=no; selector: name=cirrussearch2091.codfw.wmnet
- 14:05 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 14:04 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:03 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:03 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 14:03 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 14:03 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from snapshot1014 to dse-k8s-worker1011
- 14:02 brouberol@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1011
- 14:01 brouberol@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1011
- 14:01 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1011 on all recursors
- 14:01 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1011 on all recursors
- 14:01 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:01 brouberol@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1014 to dse-k8s-worker1011 - brouberol@cumin2002"
- 14:01 brouberol@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1014 to dse-k8s-worker1011 - brouberol@cumin2002"
- 13:57 brouberol@cumin2002: START - Cookbook sre.dns.netbox
- 13:56 brouberol@cumin2002: START - Cookbook sre.hosts.rename from snapshot1014 to dse-k8s-worker1011
- 13:45 dcausse: closing the UTC afternoon backport window
- 13:42 dcausse@deploy1003: Finished scap sync-world: Backport for Make weighted tags no longer be WMF-specific (T393872) (duration: 13m 28s)
- 13:37 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 13:35 dcausse@deploy1003: dcausse: Continuing with sync
- 13:32 dcausse@deploy1003: dcausse: Backport for Make weighted tags no longer be WMF-specific (T393872) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 13:28 dcausse@deploy1003: Started scap sync-world: Backport for Make weighted tags no longer be WMF-specific (T393872)
- 13:27 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 65 hosts with reason: eqiad is depooled, noisy alerts
- 13:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1211 from s8, move db2162 from s8 to x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76306 and previous config saved to /var/cache/conftool/dbconfig/20250519-132610-ladsgroup.json
- 13:23 dcausse@deploy1003: Finished scap sync-world: Backport for Growth: Remove unused PHP config settings (T388787) (duration: 14m 53s)
- 13:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76305 and previous config saved to /var/cache/conftool/dbconfig/20250519-132218-root.json
- 13:20 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1010.eqiad.wmnet with reason: host reimage
- 13:16 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1010.eqiad.wmnet with reason: host reimage
- 13:16 dcausse@deploy1003: dcausse, cyndywikime: Continuing with sync
- 13:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1203 and db2162 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76304 and previous config saved to /var/cache/conftool/dbconfig/20250519-131254-ladsgroup.json
- 13:12 dcausse@deploy1003: dcausse, cyndywikime: Backport for Growth: Remove unused PHP config settings (T388787) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:08 dcausse@deploy1003: Started scap sync-world: Backport for Growth: Remove unused PHP config settings (T388787)
- 13:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76303 and previous config saved to /var/cache/conftool/dbconfig/20250519-130713-root.json
- 12:55 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 12:53 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from snapshot1017 to dse-k8s-worker1010
- 12:52 brouberol@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1010
- 12:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76302 and previous config saved to /var/cache/conftool/dbconfig/20250519-125208-root.json
- 12:51 brouberol@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1010
- 12:51 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1010 on all recursors
- 12:51 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1010 on all recursors
- 12:51 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:51 brouberol@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1017 to dse-k8s-worker1010 - brouberol@cumin2002"
- 12:51 brouberol@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1017 to dse-k8s-worker1010 - brouberol@cumin2002"
- 12:49 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1179.eqiad.wmnet onto db1183.eqiad.wmnet
- 12:48 brouberol@cumin2002: START - Cookbook sre.dns.netbox
- 12:47 brouberol@cumin2002: START - Cookbook sre.hosts.rename from snapshot1017 to dse-k8s-worker1010
- 12:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1179 T394661', diff saved to https://phabricator.wikimedia.org/P76301 and previous config saved to /var/cache/conftool/dbconfig/20250519-124302-marostegui.json
- 12:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Maintenance
- 12:38 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 12:38 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 12:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76300 and previous config saved to /var/cache/conftool/dbconfig/20250519-123702-root.json
- 12:30 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.cd
- 12:30 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cd
- 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76299 and previous config saved to /var/cache/conftool/dbconfig/20250519-122157-root.json
- 12:18 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 12:18 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 12:16 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.cd
- 12:15 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cd
- 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76298 and previous config saved to /var/cache/conftool/dbconfig/20250519-120651-root.json
- 12:05 isaranto@deploy1003: Finished scap sync-world: Backport for Create dblist for ores extension (T391103) (duration: 12m 27s)
- 11:58 isaranto@deploy1003: isaranto, jsn: Continuing with sync
- 11:56 isaranto@deploy1003: isaranto, jsn: Backport for Create dblist for ores extension (T391103) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:56 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 11:56 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 11:56 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 11:55 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 11:55 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 11:55 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1167 and db2152 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76297 and previous config saved to /var/cache/conftool/dbconfig/20250519-115411-ladsgroup.json
- 11:52 isaranto@deploy1003: Started scap sync-world: Backport for Create dblist for ores extension (T391103)
- 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76296 and previous config saved to /var/cache/conftool/dbconfig/20250519-115146-root.json
- 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76295 and previous config saved to /var/cache/conftool/dbconfig/20250519-113639-root.json
- 11:28 cgoubert@deploy1003: Finished scap sync-world: 1147709: mediawiki: Add startingDeadlineSeconds to CronJobs - T394423 (duration: 02m 16s)
- 11:26 cgoubert@deploy1003: cgoubert: 1147709: mediawiki: Add startingDeadlineSeconds to CronJobs - T394423 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:26 cgoubert@deploy1003: Started scap sync-world: 1147709: mediawiki: Add startingDeadlineSeconds to CronJobs - T394423
- 11:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 11:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169 T394653', diff saved to https://phabricator.wikimedia.org/P76294 and previous config saved to /var/cache/conftool/dbconfig/20250519-112356-marostegui.json
- 10:58 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1226 and db2163 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76293 and previous config saved to /var/cache/conftool/dbconfig/20250519-105013-ladsgroup.json
- 10:32 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 10:31 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 10:30 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1178 and db2165 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76292 and previous config saved to /var/cache/conftool/dbconfig/20250519-102615-ladsgroup.json
- 10:21 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 10:18 jynus: testing paging status with db1204
- 10:14 marostegui: Move eqiad s5 replicas (except sanitarium master and backup sources) to SBR dbmaint T383795
- 10:13 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 10:12 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 10:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1172 and db2164 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76291 and previous config saved to /var/cache/conftool/dbconfig/20250519-100000-ladsgroup.json
- 09:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 09:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 09:17 moritzm: installing net-tools security updates
- 09:16 jynus: downtime of elastic alerts T394640
- 09:01 taavi@deploy1003: Finished scap sync-world: Backport for Do not show thumbnails or descriptions on Wikitech search (duration: 25m 50s)
- 08:58 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2005-dev.codfw.wmnet
- 08:55 joal@deploy1003: Finished deploy [airflow-dags/analytics@536dc9e]: Add new artifact to Airflow cache (after git pull ...) (duration: 00m 38s)
- 08:55 joal@deploy1003: Started deploy [airflow-dags/analytics@536dc9e]: Add new artifact to Airflow cache (after git pull ...)
- 08:54 joal@deploy1003: Finished deploy [airflow-dags/analytics@4ebb376]: Add new artifact to Airflow cache (duration: 00m 07s)
- 08:54 joal@deploy1003: Started deploy [airflow-dags/analytics@4ebb376]: Add new artifact to Airflow cache
- 08:52 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudnet2005-dev.codfw.wmnet
- 08:52 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@472cc1c]: T393559 (duration: 01m 14s)
- 08:52 taavi@deploy1003: taavi: Continuing with sync
- 08:51 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@472cc1c]: T393559
- 08:51 taavi@deploy1003: taavi: Backport for Do not show thumbnails or descriptions on Wikitech search synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:36 taavi@deploy1003: Started scap sync-world: Backport for Do not show thumbnails or descriptions on Wikitech search
- 08:28 marostegui: Install 10.6.22 on db1176 and db2230 testing hosts T394623
- 08:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2230.codfw.wmnet with reason: Maintenance
- 08:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1176.eqiad.wmnet with reason: Maintenance
- 08:18 jynus: testing paging status with db1204
- 07:11 moritzm: updated bootimage for Bookworm to 12.11 T394489
- 06:47 jmm@dns1004: END - running authdns-update
- 06:47 jmm@dns1004: START - running authdns-update
- 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76287 and previous config saved to /var/cache/conftool/dbconfig/20250519-063400-root.json
- 06:29 moritzm: installing Java 21 security updates
- 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76286 and previous config saved to /var/cache/conftool/dbconfig/20250519-061855-root.json
- 06:17 moritzm: installing openjdk-8 security updates
- 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1183 T394507', diff saved to https://phabricator.wikimedia.org/P76285 and previous config saved to /var/cache/conftool/dbconfig/20250519-060946-marostegui.json
- 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76284 and previous config saved to /var/cache/conftool/dbconfig/20250519-060349-root.json
- 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76283 and previous config saved to /var/cache/conftool/dbconfig/20250519-054844-root.json
- 05:42 moritzm: uploaded openjdk-8 8u452-ga-1~deb12u1 to component/jdk8 for bookworm-wikimedia
- 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76282 and previous config saved to /var/cache/conftool/dbconfig/20250519-053338-root.json
- 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76281 and previous config saved to /var/cache/conftool/dbconfig/20250519-051832-root.json
- 05:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1210 T394508', diff saved to https://phabricator.wikimedia.org/P76280 and previous config saved to /var/cache/conftool/dbconfig/20250519-051224-marostegui.json
- 04:59 marostegui: Deploy schema change in x1 eqiad (with replication) dbmaint T394509
2025-05-18
- 12:50 sbassett: Ran scap remove-patch for T392976
2025-05-17
- 17:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2018.codfw.wmnet with OS bookworm
- 17:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 17:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['pc2018']
- 17:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest2004']
- 17:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest2004']
- 17:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2018']
- 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2010-dev']
- 17:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2048']
- 17:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2047']
- 17:08 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2048']
- 17:08 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2047']
- 17:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:59 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2010-dev']
- 16:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host pc2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:25 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 15:25 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 15:24 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 15:24 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:17 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2004
- 15:17 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2004
- 15:17 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pc2018
- 15:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host pc2018
- 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2048
- 15:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host es2048
- 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2047
- 15:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host es2047
- 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol2010-dev
- 15:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol2010-dev
- 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2005
- 15:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2005
- 07:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
2025-05-16
- 23:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest2003']
- 23:57 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest2003']
- 23:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest2003']
- 23:57 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest2003']
- 23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['thanos-be2006']
- 23:56 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-be2006']
- 23:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['apus-be2004']
- 23:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['apus-be2004']
- 23:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2008.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-be2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2003
- 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2009
- 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2008
- 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2007
- 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-be2004
- 23:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2006
- 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2003
- 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2009
- 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2008
- 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2007
- 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2006
- 23:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host apus-be2004
- 23:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2047 to codfw - jhancock@cumin2002"
- 23:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2047 to codfw - jhancock@cumin2002"
- 23:06 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 23:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2006 to codfw - jhancock@cumin2002"
- 23:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2006 to codfw - jhancock@cumin2002"
- 22:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 21:16 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 21:16 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 21:16 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 21:16 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 21:16 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 21:00 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 20:52 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 20:42 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 20:41 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 20:31 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 20:26 herron: titan100[12] systemctl restart thanos-query
- 19:44 cstone: civicrm upgraded from 2ae29ec9 to 5b155eaa
- 19:06 robh@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 19:06 robh@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 19:05 robh@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 19:05 robh@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 19:04 robh@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 19:04 robh@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:46 robh@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:34 robh@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:34 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:26 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:25 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:25 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:25 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:13 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:12 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:12 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 18:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1247.eqiad.wmnet with reason: To be set up in a few days
- 18:00 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 17:59 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-coord1004.eqiad.wmnet
- 17:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 17:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 17:49 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1238.eqiad.wmnet onto db1247.eqiad.wmnet
- 17:49 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1238 gradually with 4 steps - Pool db1238.eqiad.wmnet in after cloning
- 17:03 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1238 gradually with 4 steps - Pool db1238.eqiad.wmnet in after cloning
- 16:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1177.eqiad.wmnet with OS bullseye
- 15:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1177.eqiad.wmnet with reason: host reimage
- 15:38 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:38 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1177.eqiad.wmnet with reason: host reimage
- 15:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 15:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 15:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 14:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 14:22 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-be1004.eqiad.wmnet with OS bookworm
- 14:22 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 14:22 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 14:11 root@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-backup1002.eqiad.wmnet: Renew puppet certificate - root@cumin1002
- 14:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 14:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db2166 and db1177 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76270 and previous config saved to /var/cache/conftool/dbconfig/20250516-135438-ladsgroup.json
- 13:52 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1238.eqiad.wmnet onto db1247.eqiad.wmnet
- 13:50 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1188 gradually with 4 steps - Pooling back in
- 13:50 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1238.eqiad.wmnet onto db1247.eqiad.wmnet
- 13:47 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1238 - Depool db1238.eqiad.wmnet to then clone it to db1247.eqiad.wmnet - fceratto@cumin1002
- 13:47 fceratto@cumin1002: START - Cookbook sre.mysql.depool db1238 - Depool db1238.eqiad.wmnet to then clone it to db1247.eqiad.wmnet - fceratto@cumin1002
- 13:47 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1238.eqiad.wmnet onto db1247.eqiad.wmnet
- 13:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 13:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 13:21 hashar@deploy1003: Finished deploy [gerrit/gerrit@fcb893c]: wm-zuul-status: do not popup when navigating changes - T394485 (duration: 00m 12s)
- 13:21 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-be1004.eqiad.wmnet with reason: host reimage
- 13:21 hashar@deploy1003: Started deploy [gerrit/gerrit@fcb893c]: wm-zuul-status: do not popup when navigating changes - T394485
- 13:17 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-be1004.eqiad.wmnet with reason: host reimage
- 13:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 13:05 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1188 gradually with 4 steps - Pooling back in
- 13:03 joal@deploy1003: Finished deploy [airflow-dags/analytics@4ebb376]: Fix gobblin artifacts (after pulling code...) (duration: 01m 01s)
- 13:02 joal@deploy1003: Started deploy [airflow-dags/analytics@4ebb376]: Fix gobblin artifacts (after pulling code...)
- 13:02 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@4ebb376]: Fix gobblin artifacts (duration: 00m 16s)
- 13:01 joal@deploy1003: Started deploy [airflow-dags/analytics_test@4ebb376]: Fix gobblin artifacts
- 13:00 joal@deploy1003: Finished deploy [airflow-dags/analytics@4351188]: Fix gobblin artifacts (duration: 00m 07s)
- 13:00 joal@deploy1003: Started deploy [airflow-dags/analytics@4351188]: Fix gobblin artifacts
- 12:52 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
- 12:46 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-be1004.eqiad.wmnet with OS bookworm
- 12:43 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:42 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:35 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1188 gradually with 4 steps - Pooling back in
- 12:33 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1188 gradually with 4 steps - Pooling back in
- 12:32 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
- 12:28 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-be1004.eqiad.wmnet with OS bookworm
- 12:20 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
- 12:01 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 12:01 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 11:59 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1188.eqiad.wmnet onto db1246.eqiad.wmnet
- 11:42 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1188 - Depool db1188.eqiad.wmnet to then clone it to db1246.eqiad.wmnet - fceratto@cumin1002
- 11:42 fceratto@cumin1002: START - Cookbook sre.mysql.depool db1188 - Depool db1188.eqiad.wmnet to then clone it to db1246.eqiad.wmnet - fceratto@cumin1002
- 11:42 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1188.eqiad.wmnet onto db1246.eqiad.wmnet
- 11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db2242 from x3, remove db2154 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76262 and previous config saved to /var/cache/conftool/dbconfig/20250516-112345-ladsgroup.json
- 11:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1214 from x3, remove db1257 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76261 and previous config saved to /var/cache/conftool/dbconfig/20250516-111952-ladsgroup.json
- 10:44 joal@deploy1003: Finished deploy [airflow-dags/analytics@4351188]: Deploying analytics with artifact-cache warming using main folder (duration: 00m 49s)
- 10:43 joal@deploy1003: Started deploy [airflow-dags/analytics@4351188]: Deploying analytics with artifact-cache warming using main folder
- 10:28 joal@deploy1003: Finished deploy [airflow-dags/main@4351188]: Deploying main instead of analytics subfolder (duration: 01m 51s)
- 10:26 joal@deploy1003: Started deploy [airflow-dags/main@4351188]: Deploying main instead of analytics subfolder
- 10:22 jynus: upgrading db1239 MariaDB server T394487
- 10:16 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet,ms-backup1002.eqiad.wmnet with reason: Upgrade and test
- 09:51 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@4351188]: Fix slf4j artifact sync (duration: 00m 12s)
- 09:51 joal@deploy1003: Started deploy [airflow-dags/analytics_test@4351188]: Fix slf4j artifact sync
- 09:49 btullis@deploy1003: Finished deploy [airflow-dags/analytics_test@c2d660e]: Test (duration: 24m 55s)
- 09:27 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:27 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:24 btullis@deploy1003: Started deploy [airflow-dags/analytics_test@c2d660e]: Test
- 09:19 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@c2d660e]: Deploying artifacts for analytics_test manually (duration: 21m 38s)
- 08:58 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@c2d660e]: Deploying artifacts for analytics_test manually
- 08:39 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@0b9e2aa]: Deploying artifacts for analytics_test manually (duration: 00m 51s)
- 08:38 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@0b9e2aa]: Deploying artifacts for analytics_test manually
- 08:28 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:27 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:26 moritzm: uploaded httpbb 0.0.5-1+deb12u1 to apt.wikimedia.org T393711 T389380
- 08:14 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76260 and previous config saved to /var/cache/conftool/dbconfig/20250516-081428-root.json
- 08:08 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ODimitrijevic out of all services on: 1426 hosts
- 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76259 and previous config saved to /var/cache/conftool/dbconfig/20250516-080752-root.json
- 08:07 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ODimitrijevic out of all services on: 945 hosts
- 07:59 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76258 and previous config saved to /var/cache/conftool/dbconfig/20250516-075923-root.json
- 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76257 and previous config saved to /var/cache/conftool/dbconfig/20250516-075246-root.json
- 07:50 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on build2002.codfw.wmnet with reason: busy JDK build
- 07:44 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76256 and previous config saved to /var/cache/conftool/dbconfig/20250516-074417-root.json
- 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76255 and previous config saved to /var/cache/conftool/dbconfig/20250516-073741-root.json
- 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76253 and previous config saved to /var/cache/conftool/dbconfig/20250516-072911-root.json
- 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76251 and previous config saved to /var/cache/conftool/dbconfig/20250516-072235-root.json
- 07:20 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76250 and previous config saved to /var/cache/conftool/dbconfig/20250516-071406-root.json
- 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76249 and previous config saved to /var/cache/conftool/dbconfig/20250516-070730-root.json
- 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76248 and previous config saved to /var/cache/conftool/dbconfig/20250516-065901-root.json
- 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76247 and previous config saved to /var/cache/conftool/dbconfig/20250516-065224-root.json
- 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76246 and previous config saved to /var/cache/conftool/dbconfig/20250516-064356-root.json
- 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76245 and previous config saved to /var/cache/conftool/dbconfig/20250516-064153-root.json
- 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76244 and previous config saved to /var/cache/conftool/dbconfig/20250516-063719-root.json
- 06:37 marostegui@dns1006: END - running authdns-update
- 06:36 marostegui@dns1006: START - running authdns-update
- 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76243 and previous config saved to /var/cache/conftool/dbconfig/20250516-063009-root.json
- 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76242 and previous config saved to /var/cache/conftool/dbconfig/20250516-062851-root.json
- 06:27 moritzm: uploaded openjdk-21 21.0.7+6-1~deb12u1 to component/jdk21 for bookworm (latest Java 21 security release)
- 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76241 and previous config saved to /var/cache/conftool/dbconfig/20250516-062648-root.json
- 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'es1044 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76240 and previous config saved to /var/cache/conftool/dbconfig/20250516-062213-root.json
- 06:18 moritzm: installing Java 21 security updates on idp-test
- 06:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2046.codfw.wmnet,es1044.eqiad.wmnet with reason: Maintenance
- 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2046 es1044 T391921', diff saved to https://phabricator.wikimedia.org/P76239 and previous config saved to /var/cache/conftool/dbconfig/20250516-061649-marostegui.json
- 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76238 and previous config saved to /var/cache/conftool/dbconfig/20250516-061503-root.json
- 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76237 and previous config saved to /var/cache/conftool/dbconfig/20250516-061142-root.json
- 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1045 and es2045 to es5 masters T391921', diff saved to https://phabricator.wikimedia.org/P76236 and previous config saved to /var/cache/conftool/dbconfig/20250516-060652-marostegui.json
- 06:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 06:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76235 and previous config saved to /var/cache/conftool/dbconfig/20250516-055958-root.json
- 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76234 and previous config saved to /var/cache/conftool/dbconfig/20250516-055637-root.json
- 05:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1086 to cirrussearch1086
- 05:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1086
- 05:51 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1086
- 05:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1086 on all recursors
- 05:51 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1086 on all recursors
- 05:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 05:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1086 to cirrussearch1086 - ryankemper@cumin2002"
- 05:51 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1086 to cirrussearch1086 - ryankemper@cumin2002"
- 05:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1085 to cirrussearch1085
- 05:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1085
- 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76233 and previous config saved to /var/cache/conftool/dbconfig/20250516-054452-root.json
- 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76232 and previous config saved to /var/cache/conftool/dbconfig/20250516-054131-root.json
- 05:35 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1085
- 05:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1085 on all recursors
- 05:35 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1085 on all recursors
- 05:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 05:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1085 to cirrussearch1085 - ryankemper@cumin2002"
- 05:33 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 05:33 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1085 to cirrussearch1085 - ryankemper@cumin2002"
- 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76231 and previous config saved to /var/cache/conftool/dbconfig/20250516-052947-root.json
- 05:29 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1086 to cirrussearch1086
- 05:28 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 05:28 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1085 to cirrussearch1085
- 05:27 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1079.eqiad.wmnet with OS bullseye
- 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76230 and previous config saved to /var/cache/conftool/dbconfig/20250516-052625-root.json
- 05:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1078.eqiad.wmnet with OS bullseye
- 05:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76229 and previous config saved to /var/cache/conftool/dbconfig/20250516-051442-root.json
- 05:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1079.eqiad.wmnet with reason: host reimage
- 05:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2044.codfw.wmnet,es1046.eqiad.wmnet with reason: Maintenance
- 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1046 es2044 T391921', diff saved to https://phabricator.wikimedia.org/P76228 and previous config saved to /var/cache/conftool/dbconfig/20250516-050707-marostegui.json
- 05:04 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1079.eqiad.wmnet with reason: host reimage
- 05:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1078.eqiad.wmnet with reason: host reimage
- 05:01 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 05:01 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 04:56 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1078.eqiad.wmnet with reason: host reimage
- 04:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1079
- 04:49 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1079
- 04:49 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1079.eqiad.wmnet with OS bullseye
- 04:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1078
- 04:42 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1078
- 04:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1078.eqiad.wmnet with OS bullseye
- 04:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1079 to cirrussearch1079
- 04:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1079
- 04:29 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1079
- 04:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1079 on all recursors
- 04:29 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1079 on all recursors
- 04:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1079 to cirrussearch1079 - ryankemper@cumin2002"
- 04:26 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1079 to cirrussearch1079 - ryankemper@cumin2002"
- 04:11 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1078 to cirrussearch1078
- 04:10 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1078
- 04:10 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 04:10 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1079 to cirrussearch1079
- 04:02 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1078
- 04:02 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1078 on all recursors
- 04:02 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1078 on all recursors
- 04:02 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:02 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1078 to cirrussearch1078 - ryankemper@cumin2002"
- 03:58 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1078 to cirrussearch1078 - ryankemper@cumin2002"
- 03:49 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 03:49 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1078 to cirrussearch1078
- 03:27 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1077.eqiad.wmnet with OS bullseye
- 03:20 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1076.eqiad.wmnet with OS bullseye
- 03:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1077.eqiad.wmnet with reason: host reimage
- 02:58 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1077.eqiad.wmnet with reason: host reimage
- 02:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1076.eqiad.wmnet with reason: host reimage
- 02:51 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1076.eqiad.wmnet with reason: host reimage
- 02:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1077
- 02:44 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1077
- 02:43 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1077.eqiad.wmnet with OS bullseye
- 02:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1076
- 02:37 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1076
- 02:37 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1076.eqiad.wmnet with OS bullseye
- 02:34 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1077 to cirrussearch1077
- 02:34 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1077
- 02:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1076 to cirrussearch1076
- 02:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1076
- 02:32 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1077
- 02:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1077 on all recursors
- 02:32 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1077 on all recursors
- 02:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 02:30 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1076
- 02:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1076 on all recursors
- 02:30 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1076 on all recursors
- 02:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 02:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1076 to cirrussearch1076 - ryankemper@cumin2002"
- 02:30 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1076 to cirrussearch1076 - ryankemper@cumin2002"
- 02:29 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 02:23 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 02:23 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1077 to cirrussearch1077
- 02:23 ryankemper@cumin2002: START - Cookbook sre.hosts.rename from elastic1076 to cirrussearch1076
- 01:27 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2002.codfw.wmnet with OS bullseye
- 01:16 brett: Restarting tomcat10 on idp1004
- 01:06 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3066.*
- 01:04 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3074.*
- 00:49 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5031.*
- 00:48 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5031
- 00:18 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
- 00:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
2025-05-15
- 23:59 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS bullseye
- 22:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp503[1-2].eqsin.wmnet} and A:cp - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f818c5f7df0>>
- 22:27 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 22:23 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 22:11 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_eqsin - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f87783bdac0>>
- 22:00 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2002.codfw.wmnet with OS bullseye
- 21:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp503[1-2].eqsin.wmnet} and A:cp - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f818c5f7df0>>
- 21:40 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-upload_eqsin - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7fc4014eef10>>
- 21:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
- 21:34 dancy@deploy1003: Installation of scap version "4.169.0" completed for 2 hosts
- 21:32 dancy@deploy1003: Installing scap version "4.169.0" for 2 host(s)
- 21:31 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
- 21:20 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 21:19 jdrewniak@deploy1003: Finished scap sync-world: Backport for styles: Set override also to former value of `line-height-small` token (T389900 T394305) (duration: 18m 45s)
- 21:18 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 21:16 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS bullseye
- 21:12 jdrewniak@deploy1003: jdrewniak: Continuing with sync
- 21:06 jdrewniak@deploy1003: jdrewniak: Backport for styles: Set override also to former value of `line-height-small` token (T389900 T394305) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:00 jdrewniak@deploy1003: Started scap sync-world: Backport for styles: Set override also to former value of `line-height-small` token (T389900 T394305)
- 20:53 thcipriani@deploy1003: Finished scap sync-world: Backport for frwiki: Enable the NewUserMessage extension (T382199) (duration: 14m 44s)
- 20:47 thcipriani@deploy1003: thcipriani, wpld: Continuing with sync
- 20:44 thcipriani@deploy1003: thcipriani, wpld: Backport for frwiki: Enable the NewUserMessage extension (T382199) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:38 thcipriani@deploy1003: Started scap sync-world: Backport for frwiki: Enable the NewUserMessage extension (T382199)
- 20:35 thcipriani@deploy1003: Finished scap sync-world: Backport for Design Research participant recruitment survey on eswiki: Deploy (T394315) (duration: 13m 46s)
- 20:33 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2002.codfw.wmnet with OS bullseye
- 20:28 thcipriani@deploy1003: thcipriani, dani: Continuing with sync
- 20:27 thcipriani@deploy1003: thcipriani, dani: Backport for Design Research participant recruitment survey on eswiki: Deploy (T394315) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:21 thcipriani@deploy1003: Started scap sync-world: Backport for Design Research participant recruitment survey on eswiki: Deploy (T394315)
- 20:17 bvibber@deploy1003: Finished scap sync-world: Backport for Enable Chart extension on phase 2 wikis (T393518) (duration: 13m 15s)
- 20:13 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
- 20:10 bvibber@deploy1003: bvibber: Continuing with sync
- 20:09 bvibber@deploy1003: bvibber: Backport for Enable Chart extension on phase 2 wikis (T393518) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:09 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
- 20:04 bvibber@deploy1003: Started scap sync-world: Backport for Enable Chart extension on phase 2 wikis (T393518)
- 19:54 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS bullseye
- 19:08 dancy@deploy1003: Installation of scap version "4.168.1" completed for 2 hosts
- 19:06 dancy@deploy1003: Installing scap version "4.168.1" for 2 host(s)
- 18:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqsin - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7fc4014eef10>>
- 18:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqsin - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f87783bdac0>>
- 18:53 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7fd386623c70>>
- 18:49 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f58099c1b50>>
- 18:41 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 18:40 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 18:36 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 18:36 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 18:35 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 18:34 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 17:46 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw
- 17:24 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw
- 17:23 topranks: add remaining bgp peerings from codfw row A-D switches to new spines in rows E/F T394021
- 17:12 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:12 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:12 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:11 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:11 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:10 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:10 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:09 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:04 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad
- 16:40 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad
- 16:36 sbassett: helmfile [staging] HALTED helmfile.d/services/miscweb: apply
- 16:35 topranks: add bgp peerings from codfw row A-D switches to new spines in rows E/F T394021
- 16:27 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 16:27 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 16:17 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 16:16 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3073.esams.wmnet
- 16:16 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3081.esams.wmnet
- 16:14 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3081.esams.wmnet
- 16:14 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3073.esams.wmnet
- 16:13 logmsgbot: mszabo Deployed security patch for T394393
- 16:07 logmsgbot: mszabo Deployed security patch for T394393
- 16:07 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
- 15:56 mszabo: Starting patch deployment for T394393
- 15:55 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3073.esams.wmnet
- 15:55 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3081.esams.wmnet
- 15:50 dancy@deploy1003: Installation of scap version "4.168.0" completed for 2 hosts
- 15:49 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
- 15:48 dancy@deploy1003: Installing scap version "4.168.0" for 2 host(s)
- 15:45 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f58099c1b50>>
- 15:45 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7fd386623c70>>
- 15:40 fabfur: reenabling puppet on A:cp (T393927)
- 15:32 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-upload_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f1e600881c0>>
- 15:32 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=1) rolling upgrade of Varnish on A:cp-text_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f10f2f03a00>>
- 15:31 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f1e600881c0>>
- 15:31 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_esams - <bound method SREBatchRunnerBase._reason of <cookbooks.sre.cdn.roll-upgrade-varnish.RollUpgradeVarnishRunner object at 0x7f10f2f03a00>>
- 15:21 jnuche@deploy1003: Finished scap sync-world: Backport for Revert "Make weighted tags no longer be WMF-specific" (duration: 11m 45s)
- 15:15 jnuche@deploy1003: dcausse, jnuche: Continuing with sync
- 15:14 jnuche@deploy1003: dcausse, jnuche: Backport for Revert "Make weighted tags no longer be WMF-specific" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:10 jnuche@deploy1003: Started scap sync-world: Backport for Revert "Make weighted tags no longer be WMF-specific"
- 15:02 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3081.esams.wmnet
- 15:02 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3073.esams.wmnet
- 15:02 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1071.eqiad.wmnet with OS bookworm
- 15:02 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 14:58 fabfur: disable puppet on A:cp to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1144620 (T393927)
- 14:58 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 14:58 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1073.eqiad.wmnet with OS bookworm
- 14:58 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 14:54 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 14:40 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1071.eqiad.wmnet with reason: host reimage
- 14:37 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1073.eqiad.wmnet with reason: host reimage
- 14:37 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1071.eqiad.wmnet with reason: host reimage
- 14:33 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1073.eqiad.wmnet with reason: host reimage
- 14:22 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1071.eqiad.wmnet with OS bookworm
- 14:21 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:18 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1073.eqiad.wmnet with OS bookworm
- 14:17 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:13 sukhe: finished running lowering of dyna/upload TTL to 240s: T394312
- 14:13 sukhe@dns1004: END - running authdns-update
- 14:12 sukhe@dns1004: START - running authdns-update
- 14:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:08 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1071.eqiad.wmnet with OS bookworm
- 14:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:04 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1073.eqiad.wmnet with OS bookworm
- 14:04 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1074.eqiad.wmnet with OS bookworm
- 14:04 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 14:03 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:01 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:01 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 13:58 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 13:58 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 13:58 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 13:57 moritzm: installing openjdk-8 security updates
- 13:46 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1073.eqiad.wmnet with OS bookworm
- 13:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1073.eqiad.wmnet with OS bookworm
- 13:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1074.eqiad.wmnet with reason: host reimage
- 13:40 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1072.eqiad.wmnet with reason: host reimage
- 13:38 Lucas_WMDE: UTC afternoon backport+config window done
- 13:37 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for [Growth] eswiki: Bump mentorship to 70% of users (T392869) (duration: 20m 39s)
- 13:36 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1074.eqiad.wmnet with reason: host reimage
- 13:36 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1072.eqiad.wmnet with reason: host reimage
- 13:36 aokoth@dns1004: END - running authdns-update
- 13:34 aokoth@dns1004: START - running authdns-update
- 13:30 lucaswerkmeister-wmde@deploy1003: urbanecm, lucaswerkmeister-wmde: Continuing with sync
- 13:26 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1073.eqiad.wmnet with OS bookworm
- 13:22 lucaswerkmeister-wmde@deploy1003: urbanecm, lucaswerkmeister-wmde: Backport for [Growth] eswiki: Bump mentorship to 70% of users (T392869) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:22 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1071.eqiad.wmnet with OS bookworm
- 13:20 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1074.eqiad.wmnet with OS bookworm
- 13:19 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1072.eqiad.wmnet with OS bookworm
- 13:16 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for [Growth] eswiki: Bump mentorship to 70% of users (T392869)
- 13:14 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 12:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generic update - jhancock@cumin2002"
- 12:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generic update - jhancock@cumin2002"
- 12:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-be1004.eqiad.wmnet with OS bookworm
- 12:46 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 12:25 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 12:22 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 12:19 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:19 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:16 fabfur@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Revert last template change - fabfur@cumin1002"
- 12:16 fabfur@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Revert last template change - fabfur@cumin1002
- 12:16 fabfur@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Revert last template change - fabfur@cumin1002
- 12:16 fabfur@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Revert last template change - fabfur@cumin1002"
- 12:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1177.eqiad.wmnet with OS bullseye
- 12:10 dreamyjazz@deploy1003: Finished scap sync-world: Backport for CustomBlockedDomainStorage::validateDomain: Undo hard-deprecation whilst prod callers exist (T394267) (duration: 13m 30s)
- 12:10 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 12:03 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
- 12:03 dreamyjazz@deploy1003: dreamyjazz: Backport for CustomBlockedDomainStorage::validateDomain: Undo hard-deprecation whilst prod callers exist (T394267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:03 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:03 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:02 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:02 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 11:57 dreamyjazz@deploy1003: Started scap sync-world: Backport for CustomBlockedDomainStorage::validateDomain: Undo hard-deprecation whilst prod callers exist (T394267)
- 11:45 sukhe: removing downtime on A:ncredir
- 11:44 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 14 hosts
- 11:44 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for 14 hosts
- 11:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 11:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
- 11:31 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 11:28 sukhe@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 14 hosts with reason: monitoring alerts
- 11:21 sukhe: sudo cumin -b1 -s10 "A:wikidough" "run-puppet-agent": T370821
- 11:12 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1156.eqiad.wmnet
- 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Minor template modification - fabfur@cumin1002"
- 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Minor template modification - fabfur@cumin1002
- 11:10 fabfur@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Minor template modification - fabfur@cumin1002
- 11:09 fabfur@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Minor template modification - fabfur@cumin1002"
- 11:05 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1156.eqiad.wmnet
- 11:04 stevemunene@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1177.eqiad.wmnet with OS bullseye
- 10:53 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
- 10:53 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
- 10:49 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.check-dbs (exit_code=99) Checking container DBs of wikipedia-commons-local-public.cde
- 10:49 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cde
- 10:49 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.cd
- 10:48 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cd
- 10:38 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.cd
- 10:37 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cd
- 10:36 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
- 10:36 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.check-dbs (exit_code=99) Checking container DBs of wikipedia-commons-local-public.cd
- 10:36 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.cd
- 10:34 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.f2
- 10:34 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.f2
- 10:32 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ad
- 10:32 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ad
- 10:29 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.check-dbs (exit_code=99) Checking container DBs of wikipedia-commons-local-public.ad
- 10:29 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ad
- 10:23 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1177.eqiad.wmnet with OS bullseye
- 10:21 effie: mw-mcrouter minor update, memcached errors are expected
- 10:20 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.check-dbs (exit_code=99) Checking container DBs of wikipedia-commons-local-public.ad
- 10:20 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ad
- 10:19 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
- 10:15 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of wikipedia-commons-local-public.ad
- 10:15 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of wikipedia-commons-local-public.ad
- 10:08 Emperor: depool thanos-fe100[1-3] prior to decom T391352
- 10:07 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.1 refs T392171
- 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76218 and previous config saved to /var/cache/conftool/dbconfig/20250515-095108-root.json
- 09:45 dreamyjazz@deploy1003: Finished scap sync-world: Backport for FlaggablePageView: don't call getId() on null (T394381) (duration: 16m 00s)
- 09:44 isaranto@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 09:44 isaranto@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 09:39 isaranto@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 09:37 dreamyjazz@deploy1003: dreamyjazz, zabe: Continuing with sync
- 09:36 dreamyjazz@deploy1003: dreamyjazz, zabe: Backport for FlaggablePageView: don't call getId() on null (T394381) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76217 and previous config saved to /var/cache/conftool/dbconfig/20250515-093602-root.json
- 09:30 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1177.eqiad.wmnet
- 09:29 dreamyjazz@deploy1003: Started scap sync-world: Backport for FlaggablePageView: don't call getId() on null (T394381)
- 09:27 mvernon@cumin1002: conftool action : set/pooled=yes; selector: name=thanos-fe1007.eqiad.wmnet
- 09:27 mvernon@cumin1002: conftool action : set/pooled=yes; selector: name=thanos-fe1006.eqiad.wmnet
- 09:27 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76216 and previous config saved to /var/cache/conftool/dbconfig/20250515-092721-root.json
- 09:27 mvernon@cumin1002: conftool action : set/pooled=yes; selector: name=thanos-fe1005.eqiad.wmnet
- 09:27 mvernon@cumin1002: conftool action : set/weight=100; selector: name=thanos-fe1007.eqiad.wmnet
- 09:27 mvernon@cumin1002: conftool action : set/weight=100; selector: name=thanos-fe1006.eqiad.wmnet
- 09:27 mvernon@cumin1002: conftool action : set/weight=100; selector: name=thanos-fe1005.eqiad.wmnet
- 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76215 and previous config saved to /var/cache/conftool/dbconfig/20250515-092314-root.json
- 09:23 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
- 09:22 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.1 refs T392171
- 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76214 and previous config saved to /var/cache/conftool/dbconfig/20250515-092056-root.json
- 09:17 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
- 09:12 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76213 and previous config saved to /var/cache/conftool/dbconfig/20250515-091216-root.json
- 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76212 and previous config saved to /var/cache/conftool/dbconfig/20250515-090808-root.json
- 09:07 Emperor: reboot thanos-fe100[5-7] prior to bringing into service T391352
- 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76211 and previous config saved to /var/cache/conftool/dbconfig/20250515-090551-root.json
- 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76210 and previous config saved to /var/cache/conftool/dbconfig/20250515-085710-root.json
- 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76209 and previous config saved to /var/cache/conftool/dbconfig/20250515-085303-root.json
- 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc7 T394260', diff saved to https://phabricator.wikimedia.org/P76208 and previous config saved to /var/cache/conftool/dbconfig/20250515-085256-marostegui.json
- 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76207 and previous config saved to /var/cache/conftool/dbconfig/20250515-085045-root.json
- 08:50 dhinus: wikitech-static: rm -rf /srv/mediawiki/images/wikitech/archive/* (T338520)
- 08:49 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1177.eqiad.wmnet
- 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76206 and previous config saved to /var/cache/conftool/dbconfig/20250515-084204-root.json
- 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76205 and previous config saved to /var/cache/conftool/dbconfig/20250515-083744-root.json
- 08:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76204 and previous config saved to /var/cache/conftool/dbconfig/20250515-083540-root.json
- 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76203 and previous config saved to /var/cache/conftool/dbconfig/20250515-082659-root.json
- 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1187 for testing T264016', diff saved to https://phabricator.wikimedia.org/P76202 and previous config saved to /var/cache/conftool/dbconfig/20250515-082333-marostegui.json
- 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76201 and previous config saved to /var/cache/conftool/dbconfig/20250515-082238-root.json
- 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76200 and previous config saved to /var/cache/conftool/dbconfig/20250515-082002-root.json
- 08:17 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.1 refs T392171
- 08:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 08:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 08:12 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 08:12 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76198 and previous config saved to /var/cache/conftool/dbconfig/20250515-081153-root.json
- 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76197 and previous config saved to /var/cache/conftool/dbconfig/20250515-081141-root.json
- 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76196 and previous config saved to /var/cache/conftool/dbconfig/20250515-080733-root.json
- 08:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76195 and previous config saved to /var/cache/conftool/dbconfig/20250515-080456-root.json
- 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76194 and previous config saved to /var/cache/conftool/dbconfig/20250515-075648-root.json
- 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76193 and previous config saved to /var/cache/conftool/dbconfig/20250515-075636-root.json
- 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76192 and previous config saved to /var/cache/conftool/dbconfig/20250515-075228-root.json
- 07:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76191 and previous config saved to /var/cache/conftool/dbconfig/20250515-074950-root.json
- 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76190 and previous config saved to /var/cache/conftool/dbconfig/20250515-074142-root.json
- 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76189 and previous config saved to /var/cache/conftool/dbconfig/20250515-074131-root.json
- 07:40 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 07:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'es1041 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76188 and previous config saved to /var/cache/conftool/dbconfig/20250515-073723-root.json
- 07:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
- 07:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
- 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76187 and previous config saved to /var/cache/conftool/dbconfig/20250515-073445-root.json
- 07:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
- 07:33 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
- 07:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2043.codfw.wmnet,es1041.eqiad.wmnet with reason: Maintenance
- 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1041 es2043 T391921', diff saved to https://phabricator.wikimedia.org/P76186 and previous config saved to /var/cache/conftool/dbconfig/20250515-073033-marostegui.json
- 07:26 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76185 and previous config saved to /var/cache/conftool/dbconfig/20250515-072625-root.json
- 07:19 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76184 and previous config saved to /var/cache/conftool/dbconfig/20250515-071939-root.json
- 07:18 moritzm: installing nginx security updates
- 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76183 and previous config saved to /var/cache/conftool/dbconfig/20250515-071119-root.json
- 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76182 and previous config saved to /var/cache/conftool/dbconfig/20250515-070706-root.json
- 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76181 and previous config saved to /var/cache/conftool/dbconfig/20250515-070653-root.json
- 07:06 godog: add 70G to arclamp /srv
- 07:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76180 and previous config saved to /var/cache/conftool/dbconfig/20250515-070433-root.json
- 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'es1045 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76179 and previous config saved to /var/cache/conftool/dbconfig/20250515-065613-root.json
- 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76178 and previous config saved to /var/cache/conftool/dbconfig/20250515-065200-root.json
- 06:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76177 and previous config saved to /var/cache/conftool/dbconfig/20250515-065147-root.json
- 06:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2045.codfw.wmnet,es1045.eqiad.wmnet with reason: Maintenance
- 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1045 es2045 T391921', diff saved to https://phabricator.wikimedia.org/P76176 and previous config saved to /var/cache/conftool/dbconfig/20250515-065039-marostegui.json
- 06:49 kart_: Updated cxserver to 2025-05-14-005542-production (T394008, T392499)
- 06:46 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 06:46 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 06:43 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 06:43 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 06:38 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 06:38 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
- 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76175 and previous config saved to /var/cache/conftool/dbconfig/20250515-063655-root.json
- 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76174 and previous config saved to /var/cache/conftool/dbconfig/20250515-063641-root.json
- 06:21 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76173 and previous config saved to /var/cache/conftool/dbconfig/20250515-062149-root.json
- 06:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76172 and previous config saved to /var/cache/conftool/dbconfig/20250515-062135-root.json
- 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76171 and previous config saved to /var/cache/conftool/dbconfig/20250515-060643-root.json
- 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76170 and previous config saved to /var/cache/conftool/dbconfig/20250515-060629-root.json
- 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76169 and previous config saved to /var/cache/conftool/dbconfig/20250515-055137-root.json
- 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76168 and previous config saved to /var/cache/conftool/dbconfig/20250515-055124-root.json
- 05:43 marostegui@dns1006: END - running authdns-update
- 05:41 marostegui@dns1006: START - running authdns-update
- 05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1042 and es2042 to es4 masters T391921', diff saved to https://phabricator.wikimedia.org/P76167 and previous config saved to /var/cache/conftool/dbconfig/20250515-053958-marostegui.json
- 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76166 and previous config saved to /var/cache/conftool/dbconfig/20250515-053631-root.json
- 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76165 and previous config saved to /var/cache/conftool/dbconfig/20250515-053618-root.json
- 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76164 and previous config saved to /var/cache/conftool/dbconfig/20250515-052126-root.json
- 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76163 and previous config saved to /var/cache/conftool/dbconfig/20250515-052113-root.json
- 05:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on pc2017.codfw.wmnet,pc1017.eqiad.wmnet with reason: Maintenance
- 05:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc2017.codfw.wmnet with reason: Maintenance
- 05:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on pc1017.eqiad.wmnet with reason: Maintenance
- 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc7 T394260', diff saved to https://phabricator.wikimedia.org/P76162 and previous config saved to /var/cache/conftool/dbconfig/20250515-050724-marostegui.json
- 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76161 and previous config saved to /var/cache/conftool/dbconfig/20250515-050620-root.json
- 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1043 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76160 and previous config saved to /var/cache/conftool/dbconfig/20250515-050607-root.json
- 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1043 es2041 T391921', diff saved to https://phabricator.wikimedia.org/P76159 and previous config saved to /var/cache/conftool/dbconfig/20250515-045658-marostegui.json
- 04:56 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2041.codfw.wmnet,es1043.eqiad.wmnet with reason: Maintenance
- 04:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1192 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76158 and previous config saved to /var/cache/conftool/dbconfig/20250515-045631-ladsgroup.json
- 04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1256 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76157 and previous config saved to /var/cache/conftool/dbconfig/20250515-045345-ladsgroup.json
- 03:35 eileen: civicrm upgraded from a8b7c589 to 5c45f41b
- 01:08 cwhite: clear up some space on arclamp2001 to allow arclamp_compress_logs to complete
2025-05-14
- 22:51 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_drmrs
- 22:46 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_drmrs
- 22:33 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1075.eqiad.wmnet with OS bullseye
- 22:26 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:23 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1072.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:21 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1074.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:20 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:14 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1074.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:13 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1074.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1074.eqiad.wmnet with OS bullseye
- 22:10 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: T381919
- 22:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1075.eqiad.wmnet with reason: host reimage
- 22:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:08 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:06 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1075.eqiad.wmnet with reason: host reimage
- 22:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1072.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:04 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1072.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:03 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:02 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1075
- 21:51 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1075
- 21:51 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1075.eqiad.wmnet with OS bullseye
- 21:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1075 to cirrussearch1075
- 21:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1074.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:50 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1075
- 21:49 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1074
- 21:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1074.eqiad.wmnet with reason: host reimage
- 21:49 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1074
- 21:49 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1075
- 21:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1075 on all recursors
- 21:48 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1075 on all recursors
- 21:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1075 to cirrussearch1075 - bking@cumin2002"
- 21:48 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-be1004.eqiad.wmnet with OS bookworm
- 21:47 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: T381919
- 21:47 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1073.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:47 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1075 to cirrussearch1075 - bking@cumin2002"
- 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1073
- 21:46 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1073
- 21:45 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1074.eqiad.wmnet with reason: host reimage
- 21:43 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1072.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:43 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1072
- 21:43 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1072
- 21:43 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:42 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 21:40 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:40 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1071.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:40 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1075 to cirrussearch1075
- 21:39 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1071
- 21:39 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1071
- 21:37 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1069.eqiad.wmnet with OS bookworm
- 21:37 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 21:34 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 21:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1074
- 21:31 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1074
- 21:31 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1074.eqiad.wmnet with OS bullseye
- 21:28 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1070.eqiad.wmnet with OS bookworm
- 21:28 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 21:27 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 21:22 brennen: end of UTC late backport & config window (and spiderpig party)
- 21:22 brennen@deploy1003: Finished scap sync-world: Backport for Design Research participant recruitment survey on eswiki: Pre-deploy (T394315) (duration: 16m 06s)
- 21:17 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1069.eqiad.wmnet with reason: host reimage
- 21:15 brennen@deploy1003: brennen, dani: Continuing with sync
- 21:14 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1069.eqiad.wmnet with reason: host reimage
- 21:13 brennen@deploy1003: brennen, dani: Backport for Design Research participant recruitment survey on eswiki: Pre-deploy (T394315) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:10 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1070.eqiad.wmnet with reason: host reimage
- 21:08 sukhe@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=search
- 21:07 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1070.eqiad.wmnet with reason: host reimage
- 21:07 sukhe@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=search-omega
- 21:07 sukhe@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=search-psi
- 21:06 brennen@deploy1003: Started scap sync-world: Backport for Design Research participant recruitment survey on eswiki: Pre-deploy (T394315)
- 21:06 sukhe@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=seach-psi
- 21:04 jgleeson: civicrm upgraded from 4607c099 to a8b7c589
- 21:01 cscott@deploy1003: Finished scap sync-world: Backport for Remove ParserMigration configuration that matches defaults (duration: 13m 10s)
- 21:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1074 to cirrussearch1074
- 20:59 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1074
- 20:59 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1069.eqiad.wmnet with OS bookworm
- 20:59 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:58 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1074
- 20:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1074 on all recursors
- 20:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1074 on all recursors
- 20:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:58 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1074 to cirrussearch1074 - bking@cumin2002"
- 20:58 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1074 to cirrussearch1074 - bking@cumin2002"
- 20:55 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS bullseye
- 20:54 cscott@deploy1003: cscott: Continuing with sync
- 20:54 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 16 hosts
- 20:54 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for 16 hosts
- 20:53 sukhe: gdnsd reload issues should be fixed
- 20:53 sukhe@dns1004: END - running authdns-update
- 20:52 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:52 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1070.eqiad.wmnet with OS bookworm
- 20:52 cscott@deploy1003: cscott: Backport for Remove ParserMigration configuration that matches defaults synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:52 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1074 to cirrussearch1074
- 20:52 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:52 sukhe@dns1004: START - running authdns-update
- 20:48 cscott@deploy1003: Started scap sync-world: Backport for Remove ParserMigration configuration that matches defaults
- 20:47 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:41 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1069.eqiad.wmnet with OS bookworm
- 20:41 sukhe@dns1004: START - running authdns-update
- 20:40 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:40 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1075.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 20:40 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1075.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 20:36 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1070.eqiad.wmnet with OS bookworm
- 20:36 sukhe@dns1004: START - running authdns-update
- 20:32 jdrewniak@deploy1003: Finished scap sync-world: Backport for Add ArticleSummaries to beta cluster (T392520), Expand dark mode access for anons (May 2025 deployments) (T393386), Nearby should show file namespace on Commons (T52133) (duration: 12m 30s)
- 20:26 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
- 20:26 bking@dns1004: START - running authdns-update
- 20:25 jdrewniak@deploy1003: jdlrobson, jdrewniak: Continuing with sync
- 20:25 jdrewniak@deploy1003: jdlrobson, jdrewniak: Backport for Add ArticleSummaries to beta cluster (T392520), Expand dark mode access for anons (May 2025 deployments) (T393386), Nearby should show file namespace on Commons (T52133) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:24 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search
- 20:24 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search-omega
- 20:24 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search-psi
- 20:23 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
- 20:20 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:20 jdrewniak@deploy1003: Started scap sync-world: Backport for Add ArticleSummaries to beta cluster (T392520), Expand dark mode access for anons (May 2025 deployments) (T393386), Nearby should show file namespace on Commons (T52133)
- 20:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
- 20:17 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:08 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS bullseye
- 20:07 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 19:50 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1068.eqiad.wmnet with reason: host reimage
- 19:50 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1069.eqiad.wmnet with OS bookworm
- 19:50 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1056.eqiad.wmnet with OS bullseye
- 19:48 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1070.eqiad.wmnet with OS bookworm
- 19:47 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1068.eqiad.wmnet with reason: host reimage
- 19:40 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_drmrs
- 19:40 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_drmrs
- 19:36 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1070.eqiad.wmnet with OS bookworm
- 19:32 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1068.eqiad.wmnet with OS bookworm
- 19:31 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1068.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:29 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:28 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 19:27 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 19:26 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 19:25 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 19:24 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:21 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_ulsfo
- 19:21 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1055.eqiad.wmnet with OS bullseye
- 19:19 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 19:19 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1068.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:19 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1070.eqiad.wmnet with OS bookworm
- 19:17 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 19:17 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_ulsfo
- 19:16 jhuneidi@deploy1003: Finished scap sync-world: Backport for Stats: Add temporary deprecation for addLabel() normalization (T394053) (duration: 15m 24s)
- 19:16 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1068.eqiad.wmnet with OS bookworm
- 19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1055.eqiad.wmnet with OS bullseye
- 19:15 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1055.eqiad.wmnet with OS bullseye
- 19:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1056
- 19:13 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1056
- 19:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1056.eqiad.wmnet with OS bullseye
- 19:10 jhuneidi@deploy1003: jhuneidi, krinkle: Continuing with sync
- 19:08 jhuneidi@deploy1003: jhuneidi, krinkle: Backport for Stats: Add temporary deprecation for addLabel() normalization (T394053) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1056 to cirrussearch1056
- 19:03 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1056
- 19:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:02 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1056
- 19:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1056 on all recursors
- 19:02 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1056 on all recursors
- 19:02 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:02 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1056 to cirrussearch1056 - bking@cumin2002"
- 19:01 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1056 to cirrussearch1056 - bking@cumin2002"
- 19:01 jhuneidi@deploy1003: Started scap sync-world: Backport for Stats: Add temporary deprecation for addLabel() normalization (T394053)
- 18:56 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1068.eqiad.wmnet with OS bookworm
- 18:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1055
- 18:56 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1055
- 18:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1055.eqiad.wmnet with OS bullseye
- 18:56 bking@cumin2002: START - Cookbook sre.dns.netbox
- 18:56 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1056 to cirrussearch1056
- 18:55 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1055 to cirrussearch1055
- 18:54 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1055
- 18:53 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1055
- 18:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1055 on all recursors
- 18:53 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1055 on all recursors
- 18:53 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:53 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1055 to cirrussearch1055 - bking@cumin2002"
- 18:52 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1055 to cirrussearch1055 - bking@cumin2002"
- 18:47 bking@cumin2002: START - Cookbook sre.dns.netbox
- 18:47 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1055 to cirrussearch1055
- 18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:38 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:37 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:29 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:27 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1070.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:26 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1070
- 18:26 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1070
- 18:25 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1070
- 18:25 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1070
- 18:15 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1069.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:15 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1069
- 18:15 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1069
- 18:02 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1003.eqiad.wmnet
- 18:00 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1003.eqiad.wmnet
- 17:48 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1068.eqiad.wmnet with OS bookworm
- 17:35 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:35 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 17:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 17:12 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:12 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:56 sukhe: updating nameservers for wiki.gives in Markmonitor to set up delegation: T379318
- 16:43 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2029.codfw.wmnet
- 16:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_ulsfo
- 16:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_ulsfo
- 16:16 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:11 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-be2088.codfw.wmnet with reason: T381919
- 16:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-dev: apply
- 15:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-dev: apply
- 15:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host apus-be1004.eqiad.wmnet with OS bookworm
- 15:25 fabfur: removing varnishkafka from magru (T393772)
- 15:17 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:17 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:13 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:12 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:11 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:11 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:10 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:10 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:09 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:08 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:07 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:03 dancy@deploy1003: Installation of scap version "4.167.0" completed for 2 hosts
- 15:01 dancy@deploy1003: Installing scap version "4.167.0" for 2 host(s)
- 14:55 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Set $wgMediaModerationPhotoDNASubscriptionKey as empty in readme.php (T394299) (duration: 11m 20s)
- 14:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db2181 from x3 (T351820)', diff saved to https://phabricator.wikimedia.org/P76153 and previous config saved to /var/cache/conftool/dbconfig/20250514-145336-ladsgroup.json
- 14:48 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
- 14:48 dreamyjazz@deploy1003: dreamyjazz: Backport for Set $wgMediaModerationPhotoDNASubscriptionKey as empty in readme.php (T394299) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:43 dreamyjazz@deploy1003: Started scap sync-world: Backport for Set $wgMediaModerationPhotoDNASubscriptionKey as empty in readme.php (T394299)
- 14:37 moritzm: installing glib2.0 security updates
- 14:37 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1001.eqiad.wmnet
- 14:31 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1001.eqiad.wmnet
- 14:30 cgoubert@deploy1003: Finished scap sync-world: Deploy mediawiki: upgrade to mesh.configuration 1.13 - T391333 (duration: 12m 33s)
- 14:18 cgoubert@deploy1003: Started scap sync-world: Deploy mediawiki: upgrade to mesh.configuration 1.13 - T391333
- 14:16 moritzm: uploaded openjdk-8 8u452-ga-1~deb11u1 to component/jdk8 for bullseye-wikimedia
- 14:16 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:16 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:16 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:15 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76152 and previous config saved to /var/cache/conftool/dbconfig/20250514-141532-root.json
- 14:14 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:14 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet
- 14:14 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:10 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:09 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:09 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:08 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:08 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
- 14:07 klausman@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ml-lab1002.eqiad.wmnet
- 14:07 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
- 14:07 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1001.eqiad.wmnet
- 14:01 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-lab1001.eqiad.wmnet
- 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76151 and previous config saved to /var/cache/conftool/dbconfig/20250514-140027-root.json
- 13:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 13:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76150 and previous config saved to /var/cache/conftool/dbconfig/20250514-134521-root.json
- 13:40 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1156.eqiad.wmnet
- 13:38 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1156.eqiad.wmnet
- 13:36 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 13:36 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 13:35 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 13:35 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 13:35 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 13:35 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 13:34 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 13:34 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 13:33 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 13:33 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 13:32 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 13:32 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76149 and previous config saved to /var/cache/conftool/dbconfig/20250514-133016-root.json
- 13:20 Lucas_WMDE: UTC afternoon backport+config window done
- 13:19 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for manage-dblist: Rename to manage-dblist.php (T392819) (duration: 12m 48s)
- 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76148 and previous config saved to /var/cache/conftool/dbconfig/20250514-131510-root.json
- 13:13 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 13:13 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 13:12 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
- 13:11 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for manage-dblist: Rename to manage-dblist.php (T392819) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:07 godog: correction, restart grafana-server on grafana1002
- 13:06 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for manage-dblist: Rename to manage-dblist.php (T392819)
- 13:05 godog: reboot grafana1002 - hard down
- 13:01 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1068.eqiad.wmnet
- 13:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76146 and previous config saved to /var/cache/conftool/dbconfig/20250514-130004-root.json
- 12:55 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76145 and previous config saved to /var/cache/conftool/dbconfig/20250514-124458-root.json
- 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76144 and previous config saved to /var/cache/conftool/dbconfig/20250514-122952-root.json
- 12:28 joal@deploy1003: Finished deploy [analytics/refinery@9d620d0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9d620d06] (duration: 00m 46s)
- 12:28 joal@deploy1003: Started deploy [analytics/refinery@9d620d0] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9d620d06]
- 12:27 joal@deploy1003: Finished deploy [analytics/refinery@9d620d0] (thin): Analytics webrequest migration THIN [analytics/refinery@9d620d06] (duration: 01m 35s)
- 12:26 joal@deploy1003: Started deploy [analytics/refinery@9d620d0] (thin): Analytics webrequest migration THIN [analytics/refinery@9d620d06]
- 12:25 joal@deploy1003: Finished deploy [analytics/refinery@9d620d0]: Regular analytics weekly train [analytics/refinery@9d620d06] (duration: 02m 17s)
- 12:23 joal@deploy1003: Started deploy [analytics/refinery@9d620d0]: Regular analytics weekly train [analytics/refinery@9d620d06]
- 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1258 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76143 and previous config saved to /var/cache/conftool/dbconfig/20250514-121446-root.json
- 11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db2243 from s8 (T351820)', diff saved to https://phabricator.wikimedia.org/P76142 and previous config saved to /var/cache/conftool/dbconfig/20250514-114724-ladsgroup.json
- 11:47 moritzm: installing librabbitmq securit updates
- 11:41 ladsgroup@deploy1003: Finished scap sync-world: Backport for Move production term store traffic to x3 (T351820) (duration: 20m 48s)
- 11:41 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1068.eqiad.wmnet
- 11:38 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
- 11:35 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 11:27 ladsgroup@deploy1003: ladsgroup: Backport for Move production term store traffic to x3 (T351820) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:21 ladsgroup@deploy1003: Started scap sync-world: Backport for Move production term store traffic to x3 (T351820)
- 11:18 kcvelaga@deploy1003: Finished deploy [airflow-dags/analytics_product@22aa307]: T393561 (duration: 01m 10s)
- 11:17 kcvelaga@deploy1003: Started deploy [airflow-dags/analytics_product@22aa307]: T393561
- 11:15 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
- 11:15 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 11:14 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 11:13 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 11:12 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
- 11:12 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling restart_daemons on A:cephosd
- 11:10 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:10 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:03 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:01 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 10:50 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 10:49 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 10:47 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling restart_daemons on A:cephosd
- 10:44 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 10:43 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 10:41 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 10:41 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 10:40 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 10:39 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 10:28 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.1 refs T392171
- 10:12 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Also merge fields if stemming settings empty on one side (T394274) (duration: 15m 53s)
- 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76138 and previous config saved to /var/cache/conftool/dbconfig/20250514-101057-root.json
- 10:05 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
- 10:03 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Also merge fields if stemming settings empty on one side (T394274) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:58 marostegui@dns1006: END - running authdns-update
- 09:57 marostegui@dns1006: START - running authdns-update
- 09:56 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Also merge fields if stemming settings empty on one side (T394274)
- 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76137 and previous config saved to /var/cache/conftool/dbconfig/20250514-095553-root.json
- 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76136 and previous config saved to /var/cache/conftool/dbconfig/20250514-095552-root.json
- 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Add x3 codfw T390530', diff saved to https://phabricator.wikimedia.org/P76135 and previous config saved to /var/cache/conftool/dbconfig/20250514-095031-marostegui.json
- 09:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: esams routers upgrade finished, T364092]
- 09:49 ayounsi@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: esams routers upgrade finished, T364092]
- 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76133 and previous config saved to /var/cache/conftool/dbconfig/20250514-094048-root.json
- 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76132 and previous config saved to /var/cache/conftool/dbconfig/20250514-094047-root.json
- 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Add x3 eqiad T390530', diff saved to https://phabricator.wikimedia.org/P76131 and previous config saved to /var/cache/conftool/dbconfig/20250514-094038-marostegui.json
- 09:38 XioNoX: repool cr1-esams - T364092
- 09:35 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:28 XioNoX: cr1-esams> request chassis routing-engine master switch - T364092
- 09:25 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 09:25 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 09:25 moritzm: retry full planet import for Bookworm maps master (the one yesterday failed due to a bug now fixed) T381565
- 09:21 XioNoX: re1.cr1-esams> request vmhost reboot re0 - T364092
- 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76129 and previous config saved to /var/cache/conftool/dbconfig/20250514-092126-root.json
- 09:12 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
- 09:12 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
- 09:12 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
- 09:12 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
- 09:12 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
- 09:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76128 and previous config saved to /var/cache/conftool/dbconfig/20250514-091200-root.json
- 09:12 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
- 09:11 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
- 09:11 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
- 09:11 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
- 09:11 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
- 09:11 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
- 09:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76127 and previous config saved to /var/cache/conftool/dbconfig/20250514-091100-root.json
- 09:10 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
- 09:10 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
- 09:10 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync
- 09:10 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
- 09:10 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync
- 09:10 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
- 09:10 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: sync
- 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76126 and previous config saved to /var/cache/conftool/dbconfig/20250514-090621-root.json
- 09:05 XioNoX: cr1-esams> request chassis routing-engine master switch - T364092
- 08:58 XioNoX: cr1-esams request vmhost reboot re1 - T364092
- 08:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76125 and previous config saved to /var/cache/conftool/dbconfig/20250514-085655-root.json
- 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76124 and previous config saved to /var/cache/conftool/dbconfig/20250514-085555-root.json
- 08:53 marostegui: Mark db2241 as x3 master in zarcillo T390530
- 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76123 and previous config saved to /var/cache/conftool/dbconfig/20250514-085115-root.json
- 08:46 XioNoX: cr1-esams - Install image on backup RE - T364092
- 08:44 XioNoX: cr1-esams - disable transit/IX BGP sessions - T364092
- 08:43 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr1-esams,cr1-esams IPv6 with reason: cr1-esams upgrade
- 08:43 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on re0.cr1-esams.mgmt with reason: cr1-esams upgrade
- 08:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76122 and previous config saved to /var/cache/conftool/dbconfig/20250514-084149-root.json
- 08:41 ayounsi@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on cr1-esams,cr1-esams IPv6,cr1-esams.mgmt with reason: cr1-esams upgrade
- 08:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76121 and previous config saved to /var/cache/conftool/dbconfig/20250514-084049-root.json
- 08:39 XioNoX: cr1-esams# set protocols bgp graceful-shutdown sender - T364092
- 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76120 and previous config saved to /var/cache/conftool/dbconfig/20250514-083609-root.json
- 08:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76119 and previous config saved to /var/cache/conftool/dbconfig/20250514-083233-root.json
- 08:31 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1257.eqiad.wmnet onto db1258.eqiad.wmnet
- 08:31 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1257 gradually with 4 steps - Pool db1257.eqiad.wmnet in after cloning
- 08:30 marostegui@dns1006: END - running authdns-update
- 08:30 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.1 refs T392171
- 08:29 marostegui@dns1006: START - running authdns-update
- 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76117 and previous config saved to /var/cache/conftool/dbconfig/20250514-082815-root.json
- 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76116 and previous config saved to /var/cache/conftool/dbconfig/20250514-082644-root.json
- 08:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76115 and previous config saved to /var/cache/conftool/dbconfig/20250514-082543-root.json
- 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76114 and previous config saved to /var/cache/conftool/dbconfig/20250514-082102-root.json
- 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76113 and previous config saved to /var/cache/conftool/dbconfig/20250514-081728-root.json
- 08:13 XioNoX: cr2-esams> request vmhost reboot - T364092
- 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76111 and previous config saved to /var/cache/conftool/dbconfig/20250514-081311-root.json
- 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76110 and previous config saved to /var/cache/conftool/dbconfig/20250514-081139-root.json
- 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76109 and previous config saved to /var/cache/conftool/dbconfig/20250514-081037-root.json
- 08:07 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "T393381 - oblivian@cumin2002"
- 08:07 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: T393381 - oblivian@cumin2002
- 08:06 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: T393381 - oblivian@cumin2002
- 08:06 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "T393381 - oblivian@cumin2002"
- 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76108 and previous config saved to /var/cache/conftool/dbconfig/20250514-080557-root.json
- 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76107 and previous config saved to /var/cache/conftool/dbconfig/20250514-080222-root.json
- 08:01 marostegui@cumin1002: START - Cookbook sre.mysql.pool db1257 gradually with 4 steps - Pool db1257.eqiad.wmnet in after cloning
- 07:59 XioNoX: cr2-esams> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-23.4R2-S3.9.tgz - T364092
- 07:58 XioNoX: cr2-esams - disable transit/IX BGP sessions - T364092
- 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76105 and previous config saved to /var/cache/conftool/dbconfig/20250514-075805-root.json
- 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76104 and previous config saved to /var/cache/conftool/dbconfig/20250514-075633-root.json
- 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76102 and previous config saved to /var/cache/conftool/dbconfig/20250514-075532-root.json
- 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1258 to dbctl T393989', diff saved to https://phabricator.wikimedia.org/P76101 and previous config saved to /var/cache/conftool/dbconfig/20250514-075254-marostegui.json
- 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76099 and previous config saved to /var/cache/conftool/dbconfig/20250514-075052-root.json
- 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76097 and previous config saved to /var/cache/conftool/dbconfig/20250514-074717-root.json
- 07:44 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1257.eqiad.wmnet with reason: Maintenance
- 07:44 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1256.eqiad.wmnet with reason: Maintenance
- 07:43 XioNoX: cr2-esams# set protocols bgp graceful-shutdown sender - T364092
- 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76095 and previous config saved to /var/cache/conftool/dbconfig/20250514-074300-root.json
- 07:43 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-esams,cr2-esams IPv6,cr2-esams.mgmt with reason: cr2-esams upgrade
- 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76094 and previous config saved to /var/cache/conftool/dbconfig/20250514-074128-root.json
- 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76093 and previous config saved to /var/cache/conftool/dbconfig/20250514-074027-root.json
- 07:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams [reason: esams routers upgrade, T364092]
- 07:36 ayounsi@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: esams routers upgrade, T364092]
- 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76092 and previous config saved to /var/cache/conftool/dbconfig/20250514-073547-root.json
- 07:34 moritzm: installing glibc security updates
- 07:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2042.codfw.wmnet with reason: Maintenance
- 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76091 and previous config saved to /var/cache/conftool/dbconfig/20250514-073211-root.json
- 07:31 kostajh: UTC morning deploys done
- 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76090 and previous config saved to /var/cache/conftool/dbconfig/20250514-072755-root.json
- 07:26 kharlan@deploy1003: Finished scap sync-world: Backport for Use anonymous user when creating named account from temp account (T393628) (duration: 19m 51s)
- 07:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P76089 and previous config saved to /var/cache/conftool/dbconfig/20250514-072622-root.json
- 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1042 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76088 and previous config saved to /var/cache/conftool/dbconfig/20250514-072042-root.json
- 07:20 kharlan@deploy1003: kharlan: Continuing with sync
- 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76087 and previous config saved to /var/cache/conftool/dbconfig/20250514-071706-root.json
- 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76086 and previous config saved to /var/cache/conftool/dbconfig/20250514-071250-root.json
- 07:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2042.codfw.wmnet,es1042.eqiad.wmnet with reason: Maintenance
- 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1042 es2042 T391921', diff saved to https://phabricator.wikimedia.org/P76085 and previous config saved to /var/cache/conftool/dbconfig/20250514-071159-marostegui.json
- 07:11 kharlan@deploy1003: kharlan: Backport for Use anonymous user when creating named account from temp account (T393628) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P76084 and previous config saved to /var/cache/conftool/dbconfig/20250514-071117-root.json
- 07:06 kharlan@deploy1003: Started scap sync-world: Backport for Use anonymous user when creating named account from temp account (T393628)
- 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76083 and previous config saved to /var/cache/conftool/dbconfig/20250514-070200-root.json
- 06:57 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76082 and previous config saved to /var/cache/conftool/dbconfig/20250514-065744-root.json
- 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P76081 and previous config saved to /var/cache/conftool/dbconfig/20250514-065611-root.json
- 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76080 and previous config saved to /var/cache/conftool/dbconfig/20250514-064654-root.json
- 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76079 and previous config saved to /var/cache/conftool/dbconfig/20250514-064238-root.json
- 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1257 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76078 and previous config saved to /var/cache/conftool/dbconfig/20250514-064106-root.json
- 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76077 and previous config saved to /var/cache/conftool/dbconfig/20250514-063149-root.json
- 06:27 marostegui@cumin1002: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76076 and previous config saved to /var/cache/conftool/dbconfig/20250514-062733-root.json
- 06:27 marostegui: es3 migrated to MariaDB 10.11 T391921
- 06:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2034.codfw.wmnet,es1034.eqiad.wmnet with reason: Maintenance
- 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1034 es2034 T391921', diff saved to https://phabricator.wikimedia.org/P76075 and previous config saved to /var/cache/conftool/dbconfig/20250514-061721-marostegui.json
- 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1031 and es2029 to es3 masters T391921', diff saved to https://phabricator.wikimedia.org/P76074 and previous config saved to /var/cache/conftool/dbconfig/20250514-061650-marostegui.json
- 06:11 marostegui: Drop query killers from parsercache T387740
- 05:49 marostegui: Mark db1255 as x3 master in zarcillo T390530
- 05:36 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1257 - Depool db1257.eqiad.wmnet to then clone it to db1258.eqiad.wmnet - marostegui@cumin1002
- 05:36 marostegui@cumin1002: START - Cookbook sre.mysql.depool db1257 - Depool db1257.eqiad.wmnet to then clone it to db1258.eqiad.wmnet - marostegui@cumin1002
- 05:36 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1257.eqiad.wmnet onto db1258.eqiad.wmnet
- 01:43 eileen: config revision changed from c4cda34a to 5c4b83ad
- 01:38 eileen: civicrm upgraded from 18deba4c to 4607c099
- 01:13 eileen: civicrm upgraded from 40d488b8 to 18deba4c
- 00:53 eileen: config revision changed from ddf64519 to c4cda34a
2025-05-13
- 23:30 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1068.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:45 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_magru
- 22:45 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_magru
- 22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1075.eqiad.wmnet with OS bookworm
- 22:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:13 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 22:13 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 22:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:07 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 21:49 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1007.eqiad.wmnet with reason: host reimage
- 21:49 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt1068.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:48 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1068
- 21:48 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1068
- 21:47 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:47 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudvirt1068 - vriley@cumin1002"
- 21:47 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudvirt1068 - vriley@cumin1002"
- 21:47 ejegg: civicrm upgraded from 852c6ee6 to 40d488b8
- 21:45 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1007.eqiad.wmnet with reason: host reimage
- 21:44 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 21:43 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-be1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:43 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for apus-be1004 - jclark@cumin1002"
- 21:43 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 21:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for apus-be1004 - jclark@cumin1002"
- 21:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 21:39 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:29 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1075.eqiad.wmnet with reason: host reimage
- 21:14 jforrester@deploy1003: Finished scap sync-world: Backport for Register our magic vars, so the parser knows to ask us what their values are (T345477), Register our magic vars, so the parser knows to ask us what their values are (T345477) (duration: 13m 13s)
- 21:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1075.eqiad.wmnet with reason: host reimage
- 21:07 jforrester@deploy1003: jforrester: Continuing with sync
- 21:07 jforrester@deploy1003: jforrester: Backport for Register our magic vars, so the parser knows to ask us what their values are (T345477), Register our magic vars, so the parser knows to ask us what their values are (T345477) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:01 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 21:01 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 21:00 jforrester@deploy1003: Started scap sync-world: Backport for Register our magic vars, so the parser knows to ask us what their values are (T345477), Register our magic vars, so the parser knows to ask us what their values are (T345477)
- 21:00 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 21:00 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 20:59 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 20:58 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1075.eqiad.wmnet with OS bookworm
- 20:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1075']
- 20:53 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1075']
- 20:53 jforrester@deploy1003: Finished scap sync-world: Backport for Remove web_ab_test_enrollment schema (T386247) (duration: 13m 36s)
- 20:46 jforrester@deploy1003: bwang, jforrester: Continuing with sync
- 20:46 jforrester@deploy1003: bwang, jforrester: Backport for Remove web_ab_test_enrollment schema (T386247) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1075.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:40 jforrester@deploy1003: Started scap sync-world: Backport for Remove web_ab_test_enrollment schema (T386247)
- 20:38 jforrester@deploy1003: Finished scap sync-world: Backport for Stream registration for article summaries (T389097 T387406) (duration: 13m 12s)
- 20:31 jforrester@deploy1003: ksarabia, jforrester: Continuing with sync
- 20:31 jforrester@deploy1003: ksarabia, jforrester: Backport for Stream registration for article summaries (T389097 T387406) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:29 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 20:27 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:24 jforrester@deploy1003: Started scap sync-world: Backport for Stream registration for article summaries (T389097 T387406)
- 20:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1076.eqiad.wmnet with OS bookworm
- 20:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:23 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:22 jforrester@deploy1003: Finished scap sync-world: Backport for Update to echarts 5.6.0 (T393377) (duration: 11m 36s)
- 20:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:21 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1018.eqiad.wmnet with OS bookworm
- 20:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1075.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:16 jforrester@deploy1003: jforrester, jdlrobson: Continuing with sync
- 20:15 jforrester@deploy1003: jforrester, jdlrobson: Backport for Update to echarts 5.6.0 (T393377) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1075
- 20:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1075
- 20:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1075 to codfw - jhancock@cumin2002"
- 20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1075 to codfw - jhancock@cumin2002"
- 20:11 jforrester@deploy1003: Started scap sync-world: Backport for Update to echarts 5.6.0 (T393377)
- 20:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1018.eqiad.wmnet with reason: host reimage
- 20:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1076.eqiad.wmnet with reason: host reimage
- 20:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1018.eqiad.wmnet with reason: host reimage
- 20:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1018.eqiad.wmnet with OS bookworm
- 20:02 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1076.eqiad.wmnet with reason: host reimage
- 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1258.eqiad.wmnet with OS bookworm
- 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 20:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 19:56 ejegg: standalone (IPN listener) SmashPig upgraded from 4ac271dd to f96b898e
- 19:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1018.eqiad.wmnet with OS bookworm
- 19:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1076.eqiad.wmnet with OS bookworm
- 19:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1076']
- 19:45 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1076']
- 19:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1258.eqiad.wmnet with reason: host reimage
- 19:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt1076.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1018.eqiad.wmnet with reason: host reimage
- 19:40 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_magru
- 19:40 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_magru
- 19:40 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2088.codfw.wmnet with reason: T381919
- 19:37 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs3009*} and A:liberica (T393616)
- 19:37 brett@cumin2002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs3009*} and A:liberica (T393616)
- 19:37 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1258.eqiad.wmnet with reason: host reimage
- 19:37 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1018.eqiad.wmnet with reason: host reimage
- 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1076.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:23 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_eqiad
- 19:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1258.eqiad.wmnet with OS bookworm
- 19:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1018.eqiad.wmnet with OS bookworm
- 19:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirt1076.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:18 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_eqiad
- 19:17 brett: Import Varnish 7.1.1-2~bpo11+wmf1 into bullseye-wikimedia (T394004)
- 19:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudvirt1076.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:10 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1076
- 19:10 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1076
- 19:10 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1076 to codfw - jhancock@cumin2002"
- 19:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudvirt1076 to codfw - jhancock@cumin2002"
- 19:04 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1084.eqiad.wmnet with OS bullseye
- 17:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1073.eqiad.wmnet with OS bullseye
- 17:51 cstone: payments-wiki upgraded from 92a8cbb8 to 01de91b7
- 17:43 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 264595
- 17:42 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:41 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2088.codfw.wmnet with reason: T381919
- 17:41 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:41 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 264595
- 17:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1084.eqiad.wmnet with reason: host reimage
- 17:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1073.eqiad.wmnet with reason: host reimage
- 17:34 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1084.eqiad.wmnet with reason: host reimage
- 17:31 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1073.eqiad.wmnet with reason: host reimage
- 17:31 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:20 papaul: maintenance complete on all 3 switches
- 17:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1084
- 17:20 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1084
- 17:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1084.eqiad.wmnet with OS bullseye
- 17:17 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:17 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1084 to cirrussearch1084
- 17:17 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1084
- 17:16 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1073
- 17:16 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1073
- 17:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1073.eqiad.wmnet with OS bullseye
- 17:15 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1084
- 17:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1084 on all recursors
- 17:15 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1084 on all recursors
- 17:15 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:15 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1084 to cirrussearch1084 - bking@cumin2002"
- 17:15 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1084 to cirrussearch1084 - bking@cumin2002"
- 17:11 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1073 to cirrussearch1073
- 17:11 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1084 to cirrussearch1084
- 17:10 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1073
- 17:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:09 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1073
- 17:09 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1073 on all recursors
- 17:09 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1073 on all recursors
- 17:09 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1073 to cirrussearch1073 - bking@cumin2002"
- 17:08 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1073 to cirrussearch1073 - bking@cumin2002"
- 17:04 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:04 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1073 to cirrussearch1073
- 16:55 papaul: on going maintenance on msw2-codfw
- 16:50 papaul: maintenance complete on msw2-eqiad
- 16:48 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 16:48 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 16:48 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 16:47 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 16:47 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 16:47 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 16:47 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 16:46 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 16:46 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 16:46 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 16:46 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 16:45 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 16:34 dancy@deploy1003: Installation of scap version "4.166.0" completed for 2 hosts
- 16:32 dancy@deploy1003: Installing scap version "4.166.0" for 2 host(s)
- 16:28 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 16:28 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 16:28 papaul: maintenance complete on msw2-eqiad
- 16:28 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 16:27 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 16:27 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 16:26 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 16:21 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqiad
- 16:21 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqiad
- 16:20 papaul: maintenance complete on msw1-eqiad
- 16:11 dancy@deploy1003: Installation of scap version "4.165.0" completed for 2 hosts
- 16:09 dancy@deploy1003: Installing scap version "4.165.0" for 2 host(s)
- 16:09 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1018.eqiad.wmnet with OS bookworm
- 16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM testreduce1002.eqiad.wmnet
- 15:57 cgoubert@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM testreduce1002.eqiad.wmnet
- 15:57 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.ad in eqiad
- 15:56 claime: gnt-instance modify -B memory=10g testreduce1002.eqiad.wmnet - T393904
- 15:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76071 and previous config saved to /var/cache/conftool/dbconfig/20250513-155547-root.json
- 15:54 mvernon@cumin1002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.ad in eqiad
- 15:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1258.eqiad.wmnet with OS bookworm
- 15:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76070 and previous config saved to /var/cache/conftool/dbconfig/20250513-154041-root.json
- 15:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1072.eqiad.wmnet with OS bullseye
- 15:35 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1258.eqiad.wmnet with OS bookworm
- 15:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1258.eqiad.wmnet with OS bookworm
- 15:33 cmooney@dns2005: END - running authdns-update
- 15:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:32 cmooney@dns2005: START - running authdns-update
- 15:32 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1071.eqiad.wmnet with OS bullseye
- 15:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:27 papaul: on going maintenance on msw1-eqiad
- 15:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76069 and previous config saved to /var/cache/conftool/dbconfig/20250513-152631-root.json
- 15:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76068 and previous config saved to /var/cache/conftool/dbconfig/20250513-152536-root.json
- 15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:22 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate dns recrods for new codfw switches - cmooney@cumin1002"
- 15:22 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate dns recrods for new codfw switches - cmooney@cumin1002"
- 15:16 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1072.eqiad.wmnet with reason: host reimage
- 15:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76067 and previous config saved to /var/cache/conftool/dbconfig/20250513-151125-root.json
- 15:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76066 and previous config saved to /var/cache/conftool/dbconfig/20250513-151031-root.json
- 15:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1072.eqiad.wmnet with reason: host reimage
- 15:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1071.eqiad.wmnet with reason: host reimage
- 15:04 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1071.eqiad.wmnet with reason: host reimage
- 15:02 tchin@deploy1003: Finished deploy [airflow-dags/analytics@0550b16]: Deploying airflow artifacts for T384962 (duration: 02m 22s)
- 15:00 tchin@deploy1003: Started deploy [airflow-dags/analytics@0550b16]: Deploying airflow artifacts for T384962
- 14:59 papaul: maintenance complete on msw1-codfw
- 14:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76064 and previous config saved to /var/cache/conftool/dbconfig/20250513-145620-root.json
- 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76063 and previous config saved to /var/cache/conftool/dbconfig/20250513-145525-root.json
- 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76062 and previous config saved to /var/cache/conftool/dbconfig/20250513-145514-root.json
- 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76061 and previous config saved to /var/cache/conftool/dbconfig/20250513-145513-root.json
- 14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1072
- 14:54 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1072
- 14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1072.eqiad.wmnet with OS bullseye
- 14:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1071
- 14:49 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1071
- 14:49 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1071.eqiad.wmnet with OS bullseye
- 14:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1018.eqiad.wmnet with OS bookworm
- 14:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76060 and previous config saved to /var/cache/conftool/dbconfig/20250513-144113-root.json
- 14:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1258.eqiad.wmnet with OS bookworm
- 14:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76059 and previous config saved to /var/cache/conftool/dbconfig/20250513-144019-root.json
- 14:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76058 and previous config saved to /var/cache/conftool/dbconfig/20250513-144008-root.json
- 14:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76057 and previous config saved to /var/cache/conftool/dbconfig/20250513-144007-root.json
- 14:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1258.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1072 to cirrussearch1072
- 14:39 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1072
- 14:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:37 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1072
- 14:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1072 on all recursors
- 14:37 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1072 on all recursors
- 14:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1072 to cirrussearch1072 - bking@cumin2002"
- 14:37 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1072 to cirrussearch1072 - bking@cumin2002"
- 14:34 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:33 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1072 to cirrussearch1072
- 14:32 papaul: on going maintenance on msw1-codfw
- 14:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1071 to cirrussearch1071
- 14:30 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1071
- 14:29 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1071
- 14:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1071 on all recursors
- 14:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1071 on all recursors
- 14:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:29 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1071 to cirrussearch1071 - bking@cumin2002"
- 14:29 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1071 to cirrussearch1071 - bking@cumin2002"
- 14:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76056 and previous config saved to /var/cache/conftool/dbconfig/20250513-142608-root.json
- 14:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76055 and previous config saved to /var/cache/conftool/dbconfig/20250513-142513-root.json
- 14:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76054 and previous config saved to /var/cache/conftool/dbconfig/20250513-142503-root.json
- 14:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76053 and previous config saved to /var/cache/conftool/dbconfig/20250513-142501-root.json
- 14:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host pc1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host pc1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host pc1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1258.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:15 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for pc1018 db1258 - jclark@cumin1002"
- 14:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for pc1018 db1258 - jclark@cumin1002"
- 14:14 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:11 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1071 to cirrussearch1071
- 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76052 and previous config saved to /var/cache/conftool/dbconfig/20250513-141102-root.json
- 14:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76051 and previous config saved to /var/cache/conftool/dbconfig/20250513-141007-root.json
- 14:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76050 and previous config saved to /var/cache/conftool/dbconfig/20250513-140958-root.json
- 14:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76049 and previous config saved to /var/cache/conftool/dbconfig/20250513-140956-root.json
- 14:09 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 14:00 hnowlan: finalising rollout of restbaseless enwiki PCS APIs routed via rest-gateway
- 13:59 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 13:58 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 13:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76048 and previous config saved to /var/cache/conftool/dbconfig/20250513-135557-root.json
- 13:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76047 and previous config saved to /var/cache/conftool/dbconfig/20250513-135502-root.json
- 13:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76046 and previous config saved to /var/cache/conftool/dbconfig/20250513-135452-root.json
- 13:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76045 and previous config saved to /var/cache/conftool/dbconfig/20250513-135451-root.json
- 13:51 Lucas_WMDE: UTC afternoon backport+config window done
- 13:50 lucaswerkmeister-wmde@deploy1003: Sync cancelled.
- 13:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqsin [reason: cr3-eqsin upgrade finished, T364092]
- 13:47 ayounsi@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqsin [reason: cr3-eqsin upgrade finished, T364092]
- 13:46 lucaswerkmeister-wmde@deploy1003: d3r1ck01, lucaswerkmeister-wmde: Backport for SUL3: Fix account creation by username & email (with temp password) (T390751) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76044 and previous config saved to /var/cache/conftool/dbconfig/20250513-134051-root.json
- 13:40 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for SUL3: Fix account creation by username & email (with temp password) (T390751)
- 13:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76043 and previous config saved to /var/cache/conftool/dbconfig/20250513-133956-root.json
- 13:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76042 and previous config saved to /var/cache/conftool/dbconfig/20250513-133947-root.json
- 13:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P76041 and previous config saved to /var/cache/conftool/dbconfig/20250513-133946-root.json
- 13:37 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for SUL3: Fix account creation by username & email (with temp password) (T390751), SUL3: Fix account creation by username & email (with temp password) (T390751) (duration: 14m 07s)
- 13:31 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, matmarex, d3r1ck01: Continuing with sync
- 13:30 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, matmarex, d3r1ck01: Backport for SUL3: Fix account creation by username & email (with temp password) (T390751), SUL3: Fix account creation by username & email (with temp password) (T390751) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76039 and previous config saved to /var/cache/conftool/dbconfig/20250513-132545-root.json
- 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P76038 and previous config saved to /var/cache/conftool/dbconfig/20250513-132451-root.json
- 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76037 and previous config saved to /var/cache/conftool/dbconfig/20250513-132442-root.json
- 13:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P76036 and previous config saved to /var/cache/conftool/dbconfig/20250513-132441-root.json
- 13:23 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for SUL3: Fix account creation by username & email (with temp password) (T390751), SUL3: Fix account creation by username & email (with temp password) (T390751)
- 13:22 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for CirrusSearch: weighted tags mapping (during maintenance inflicted reindexing) (T389053) (duration: 15m 19s)
- 13:16 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, pfischer: Continuing with sync
- 13:15 XioNoX: cr3-eqsin> request vmhost reboot - T364092
- 13:14 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, pfischer: Backport for CirrusSearch: weighted tags mapping (during maintenance inflicted reindexing) (T389053) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76035 and previous config saved to /var/cache/conftool/dbconfig/20250513-131040-root.json
- 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P76034 and previous config saved to /var/cache/conftool/dbconfig/20250513-130945-root.json
- 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76033 and previous config saved to /var/cache/conftool/dbconfig/20250513-130937-root.json
- 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76032 and previous config saved to /var/cache/conftool/dbconfig/20250513-130935-root.json
- 13:07 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for CirrusSearch: weighted tags mapping (during maintenance inflicted reindexing) (T389053)
- 13:00 XioNoX: cr3-eqsin> request vmhost software add /var/tmp/junos-vmhost-install-mx-x86-64-23.4R2-S3.9.tgz - T364092
- 12:59 volans: upgrading python3-wmflib fleetwide (except eqsin for now)
- 12:57 XioNoX: cr3-eqsin - shutdown transit/peering BGP sessions - T364092
- 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P76031 and previous config saved to /var/cache/conftool/dbconfig/20250513-125535-root.json
- 12:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P76030 and previous config saved to /var/cache/conftool/dbconfig/20250513-125440-root.json
- 12:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76029 and previous config saved to /var/cache/conftool/dbconfig/20250513-125431-root.json
- 12:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76028 and previous config saved to /var/cache/conftool/dbconfig/20250513-125430-root.json
- 12:53 XioNoX: cr3-eqsin - lower vrrp priority - T364092
- 12:50 moritzm: trigger full planet import for Bookworm maps master T381565
- 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76027 and previous config saved to /var/cache/conftool/dbconfig/20250513-124910-root.json
- 12:47 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 12:46 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 12:40 XioNoX: cr3-eqsin# set protocols bgp graceful-shutdown sender - T364092
- 12:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P76026 and previous config saved to /var/cache/conftool/dbconfig/20250513-124029-root.json
- 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1256 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76025 and previous config saved to /var/cache/conftool/dbconfig/20250513-123935-root.json
- 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76024 and previous config saved to /var/cache/conftool/dbconfig/20250513-123926-root.json
- 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76023 and previous config saved to /var/cache/conftool/dbconfig/20250513-123925-root.json
- 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1256 future x3 hosts, to s8 T390530', diff saved to https://phabricator.wikimedia.org/P76022 and previous config saved to /var/cache/conftool/dbconfig/20250513-123917-marostegui.json
- 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76021 and previous config saved to /var/cache/conftool/dbconfig/20250513-123631-root.json
- 12:36 ayounsi@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr3-eqsin with reason: upgrade
- 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76020 and previous config saved to /var/cache/conftool/dbconfig/20250513-123404-root.json
- 12:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqsin [reason: cr3-eqsin upgrade, T364092]
- 12:31 ayounsi@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqsin [reason: cr3-eqsin upgrade, T364092]
- 12:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P76019 and previous config saved to /var/cache/conftool/dbconfig/20250513-122523-root.json
- 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P76018 and previous config saved to /var/cache/conftool/dbconfig/20250513-122407-root.json
- 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P76017 and previous config saved to /var/cache/conftool/dbconfig/20250513-122406-root.json
- 12:22 moritzm: installing libapache2-mod-auth-openidc security updates
- 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76016 and previous config saved to /var/cache/conftool/dbconfig/20250513-122126-root.json
- 12:18 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76014 and previous config saved to /var/cache/conftool/dbconfig/20250513-121858-root.json
- 12:18 volans: uploaded python3-wmflib_1.3.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia,trixie-wikimedia
- 12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1255 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76012 and previous config saved to /var/cache/conftool/dbconfig/20250513-121018-root.json
- 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P76011 and previous config saved to /var/cache/conftool/dbconfig/20250513-120902-root.json
- 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P76010 and previous config saved to /var/cache/conftool/dbconfig/20250513-120901-root.json
- 12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Add db1255 future x3 hosts, to s8 T390530', diff saved to https://phabricator.wikimedia.org/P76009 and previous config saved to /var/cache/conftool/dbconfig/20250513-120853-marostegui.json
- 12:06 moritzm: installing ucf security updates
- 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76008 and previous config saved to /var/cache/conftool/dbconfig/20250513-120621-root.json
- 12:03 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76007 and previous config saved to /var/cache/conftool/dbconfig/20250513-120352-root.json
- 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P76006 and previous config saved to /var/cache/conftool/dbconfig/20250513-115322-root.json
- 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P76005 and previous config saved to /var/cache/conftool/dbconfig/20250513-115317-root.json
- 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P76003 and previous config saved to /var/cache/conftool/dbconfig/20250513-115115-root.json
- 11:48 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76002 and previous config saved to /var/cache/conftool/dbconfig/20250513-114847-root.json
- 11:43 tchin@deploy1003: Finished deploy [airflow-dags/analytics@146dab1]: Deploying airflow artifacts for T384962 (duration: 02m 44s)
- 11:41 tchin@deploy1003: Started deploy [airflow-dags/analytics@146dab1]: Deploying airflow artifacts for T384962
- 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2242 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76001 and previous config saved to /var/cache/conftool/dbconfig/20250513-113816-root.json
- 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2241 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76000 and previous config saved to /var/cache/conftool/dbconfig/20250513-113810-root.json
- 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75999 and previous config saved to /var/cache/conftool/dbconfig/20250513-113610-root.json
- 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75998 and previous config saved to /var/cache/conftool/dbconfig/20250513-113342-root.json
- 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2241 and db2242 future x3 hosts, to s8 T390530', diff saved to https://phabricator.wikimedia.org/P75996 and previous config saved to /var/cache/conftool/dbconfig/20250513-113138-marostegui.json
- 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75995 and previous config saved to /var/cache/conftool/dbconfig/20250513-112104-root.json
- 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75994 and previous config saved to /var/cache/conftool/dbconfig/20250513-111836-root.json
- 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75993 and previous config saved to /var/cache/conftool/dbconfig/20250513-110559-root.json
- 11:03 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75992 and previous config saved to /var/cache/conftool/dbconfig/20250513-110330-root.json
- 10:58 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75991 and previous config saved to /var/cache/conftool/dbconfig/20250513-105053-root.json
- 10:48 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:48 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75990 and previous config saved to /var/cache/conftool/dbconfig/20250513-104825-root.json
- 10:46 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:43 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
- 10:40 jnuche: train finished
- 10:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
- 10:38 jayme@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
- 10:38 jayme@cumin1002: START - Cookbook sre.discovery.datacenter
- 10:38 jayme@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
- 10:38 jayme@cumin1002: START - Cookbook sre.discovery.datacenter
- 10:37 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1028 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75989 and previous config saved to /var/cache/conftool/dbconfig/20250513-103548-root.json
- 10:35 jnuche@deploy1003: Finished scap sync-world: Backport for Update for Parsoid's rename of XMLSerializer to XHtmlSerializer (T393983) (duration: 16m 38s)
- 10:26 jnuche@deploy1003: matmarex, jnuche: Continuing with sync
- 10:26 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:26 jnuche@deploy1003: matmarex, jnuche: Backport for Update for Parsoid's rename of XMLSerializer to XHtmlSerializer (T393983) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2027.codfw.wmnet,es1028.eqiad.wmnet with reason: Maintenance
- 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1028 es2027 T391921', diff saved to https://phabricator.wikimedia.org/P75988 and previous config saved to /var/cache/conftool/dbconfig/20250513-102455-marostegui.json
- 10:18 jnuche@deploy1003: Started scap sync-world: Backport for Update for Parsoid's rename of XMLSerializer to XHtmlSerializer (T393983)
- 10:14 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.1 refs T392171
- 10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
- 10:04 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
- 09:55 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:51 hnowlan: Route all PCS calls for enwiki articles starting with A via rest-gateway and without restbase
- 09:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:49 moritzm: installing wget security updates
- 09:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
- 09:41 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
- 09:38 moritzm: imported confd 0.16.0-1+deb13u0 to trixie-wikimedia T391083
- 09:14 moritzm: installing nginx security updates
- 09:11 jnuche@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.1 refs T392171
- 09:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
- 09:08 vgutierrez@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
- 09:00 vgutierrez: rolling reboot of eqiad load balancers to add E8/F8 interfaces - T393911 | T382017
- 08:52 zabe@deploy1003: Finished scap sync-world: Backport for expanddblist: Add missing use statement (T393992) (duration: 11m 48s)
- 08:45 zabe@deploy1003: zabe: Continuing with sync
- 08:45 zabe@deploy1003: zabe: Backport for expanddblist: Add missing use statement (T393992) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:40 zabe@deploy1003: Started scap sync-world: Backport for expanddblist: Add missing use statement (T393992)
- 08:35 godog: bounce thanos-query on titan1*
- 08:34 XioNoX: pfw1-eqiad - delete specific system-services in favor of "any-service" T390052
- 08:31 XioNoX: pfw1-codfw - delete specific system-services in favor of "any-service" T390052
- 08:21 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
- 08:12 moritzm: copied prometheus-rsyslog-exporter 1.0.0+git20221110-1 from bookworm-wikimedia to trixie-wikimedia T391083
- 08:11 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 3856
- 08:04 moritzm: imported python-wmflib 1.3.1+deb13u1 to trixie-wikimedia T391083
- 07:55 XioNoX: delete all unterminated cables - T393188
- 07:54 moritzm: imported python-wmflib 1.3.1+deb13u1 to trixie-wikimedia T391083
- 07:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 07:38 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 07:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 07:35 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 07:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75987 and previous config saved to /var/cache/conftool/dbconfig/20250513-073145-root.json
- 07:31 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P75986 and previous config saved to /var/cache/conftool/dbconfig/20250513-072956-root.json
- 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75985 and previous config saved to /var/cache/conftool/dbconfig/20250513-071639-root.json
- 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P75984 and previous config saved to /var/cache/conftool/dbconfig/20250513-071451-root.json
- 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75983 and previous config saved to /var/cache/conftool/dbconfig/20250513-070135-root.json
- 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P75982 and previous config saved to /var/cache/conftool/dbconfig/20250513-065946-root.json
- 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75981 and previous config saved to /var/cache/conftool/dbconfig/20250513-064629-root.json
- 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P75980 and previous config saved to /var/cache/conftool/dbconfig/20250513-064440-root.json
- 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75979 and previous config saved to /var/cache/conftool/dbconfig/20250513-063123-root.json
- 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P75978 and previous config saved to /var/cache/conftool/dbconfig/20250513-062935-root.json
- 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75977 and previous config saved to /var/cache/conftool/dbconfig/20250513-061618-root.json
- 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 30%: Repooling', diff saved to https://phabricator.wikimedia.org/P75976 and previous config saved to /var/cache/conftool/dbconfig/20250513-061430-root.json
- 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75975 and previous config saved to /var/cache/conftool/dbconfig/20250513-060113-root.json
- 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P75974 and previous config saved to /var/cache/conftool/dbconfig/20250513-055924-root.json
- 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75973 and previous config saved to /var/cache/conftool/dbconfig/20250513-054607-root.json
- 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P75972 and previous config saved to /var/cache/conftool/dbconfig/20250513-054418-root.json
- 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75971 and previous config saved to /var/cache/conftool/dbconfig/20250513-053102-root.json
- 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P75970 and previous config saved to /var/cache/conftool/dbconfig/20250513-052913-root.json
- 05:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1031.eqiad.wmnet with reason: Maintenance
- 05:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2029.codfw.wmnet with reason: Maintenance
- 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1031 es2029 T391921', diff saved to https://phabricator.wikimedia.org/P75969 and previous config saved to /var/cache/conftool/dbconfig/20250513-051617-marostegui.json
- 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.25 (duration: 04m 17s)
- 02:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T392806)', diff saved to https://phabricator.wikimedia.org/P75968 and previous config saved to /var/cache/conftool/dbconfig/20250513-025634-fceratto.json
- 02:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P75967 and previous config saved to /var/cache/conftool/dbconfig/20250513-024127-fceratto.json
- 02:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P75966 and previous config saved to /var/cache/conftool/dbconfig/20250513-022619-fceratto.json
- 02:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T392806)', diff saved to https://phabricator.wikimedia.org/P75965 and previous config saved to /var/cache/conftool/dbconfig/20250513-021112-fceratto.json
- 02:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T392806)', diff saved to https://phabricator.wikimedia.org/P75964 and previous config saved to /var/cache/conftool/dbconfig/20250513-020415-fceratto.json
- 02:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 02:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T392806)', diff saved to https://phabricator.wikimedia.org/P75963 and previous config saved to /var/cache/conftool/dbconfig/20250513-020349-fceratto.json
- 01:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P75962 and previous config saved to /var/cache/conftool/dbconfig/20250513-014841-fceratto.json
- 01:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P75961 and previous config saved to /var/cache/conftool/dbconfig/20250513-013334-fceratto.json
- 01:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T392806)', diff saved to https://phabricator.wikimedia.org/P75960 and previous config saved to /var/cache/conftool/dbconfig/20250513-011827-fceratto.json
- 01:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T392806)', diff saved to https://phabricator.wikimedia.org/P75959 and previous config saved to /var/cache/conftool/dbconfig/20250513-011026-fceratto.json
- 01:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 01:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T392806)', diff saved to https://phabricator.wikimedia.org/P75958 and previous config saved to /var/cache/conftool/dbconfig/20250513-010959-fceratto.json
- 00:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P75957 and previous config saved to /var/cache/conftool/dbconfig/20250513-005451-fceratto.json
- 00:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P75956 and previous config saved to /var/cache/conftool/dbconfig/20250513-003944-fceratto.json
- 00:32 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_codfw
- 00:31 sukhe: run agent on A:lvs-eqiad to re-enable puppet: T393911
- 00:30 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_codfw
- 00:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T392806)', diff saved to https://phabricator.wikimedia.org/P75955 and previous config saved to /var/cache/conftool/dbconfig/20250513-002436-fceratto.json
- 00:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T392806)', diff saved to https://phabricator.wikimedia.org/P75954 and previous config saved to /var/cache/conftool/dbconfig/20250513-001736-fceratto.json
- 00:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 00:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T392806)', diff saved to https://phabricator.wikimedia.org/P75953 and previous config saved to /var/cache/conftool/dbconfig/20250513-001704-fceratto.json
- 00:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P75952 and previous config saved to /var/cache/conftool/dbconfig/20250513-000157-fceratto.json
2025-05-12
- 23:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P75951 and previous config saved to /var/cache/conftool/dbconfig/20250512-234650-fceratto.json
- 23:44 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS bullseye
- 23:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T392806)', diff saved to https://phabricator.wikimedia.org/P75950 and previous config saved to /var/cache/conftool/dbconfig/20250512-233142-fceratto.json
- 23:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T392806)', diff saved to https://phabricator.wikimedia.org/P75949 and previous config saved to /var/cache/conftool/dbconfig/20250512-232504-fceratto.json
- 23:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 23:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75948 and previous config saved to /var/cache/conftool/dbconfig/20250512-232437-fceratto.json
- 23:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P75946 and previous config saved to /var/cache/conftool/dbconfig/20250512-230930-fceratto.json
- 22:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P75945 and previous config saved to /var/cache/conftool/dbconfig/20250512-225422-fceratto.json
- 22:51 ladsgroup@deploy1003: Finished scap sync-world: Backport for objectcache: Cast explicitly to integer (T393879) (duration: 11m 33s)
- 22:44 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 22:44 ladsgroup@deploy1003: ladsgroup: Backport for objectcache: Cast explicitly to integer (T393879) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:39 ladsgroup@deploy1003: Started scap sync-world: Backport for objectcache: Cast explicitly to integer (T393879)
- 22:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75944 and previous config saved to /var/cache/conftool/dbconfig/20250512-223915-fceratto.json
- 22:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75943 and previous config saved to /var/cache/conftool/dbconfig/20250512-223131-fceratto.json
- 22:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 22:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T392806)', diff saved to https://phabricator.wikimedia.org/P75942 and previous config saved to /var/cache/conftool/dbconfig/20250512-223103-fceratto.json
- 22:16 rzl: rzl@titan1002:~$ sudo systemctl restart thanos-query
- 22:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P75941 and previous config saved to /var/cache/conftool/dbconfig/20250512-221556-fceratto.json
- 22:09 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts lvs3009.esams.wmnet
- 22:08 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3009.esams.wmnet
- 22:07 cwhite: restart thanos-query on titan1001
- 22:02 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
- 22:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P75940 and previous config saved to /var/cache/conftool/dbconfig/20250512-220049-fceratto.json
- 21:59 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs3009.esams.wmnet
- 21:58 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
- 21:58 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
- 21:52 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts lvs3009.esams.wmnet
- 21:48 sbassett: Deployed security fixes 03, 04 and 05 for T392341
- 21:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T392806)', diff saved to https://phabricator.wikimedia.org/P75939 and previous config saved to /var/cache/conftool/dbconfig/20250512-214542-fceratto.json
- 21:42 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
- 21:42 eevans@cumin1002: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS bullseye
- 21:41 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp2029.codfw.wmnet with reason: Potential failed memory - T393968
- 21:40 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2029.codfw.wmnet with reason: Potential failed memory - T393968
- 21:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T392806)', diff saved to https://phabricator.wikimedia.org/P75938 and previous config saved to /var/cache/conftool/dbconfig/20250512-213731-fceratto.json
- 21:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 21:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T392806)', diff saved to https://phabricator.wikimedia.org/P75937 and previous config saved to /var/cache/conftool/dbconfig/20250512-213704-fceratto.json
- 21:33 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts lvs3009.esams.wmnet
- 21:31 sbassett: Removed mitigation for T390887 and T393367
- 21:31 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_codfw
- 21:31 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_codfw
- 21:31 denisse: Testing rsyslog_8.2504.0-1~bpo12+1 on centrallog1002 - T383309
- 21:28 ryankemper@cumin2002: conftool action : set/pooled=no; selector: name=cirrussearch2091.codfw.wmnet|cirrussearch2055.codfw.wmnet|cirrussearch2113.codfw.wmnet|cirrussearch1118.eqiad.wmnet|elastic1080.eqiad.wmnet|elastic1057.eqiad.wmnet|elastic1059.eqiad.wmnet|elastic1083.eqiad.wmnet|elastic1076.eqiad.wmnet
- 21:22 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
- 21:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P75936 and previous config saved to /var/cache/conftool/dbconfig/20250512-212157-fceratto.json
- 21:21 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts lvs3009.esams.wmnet
- 21:21 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
- 21:17 tgr@deploy1003: Finished scap sync-world: Backport for multiversion: Move remaining dblist helper to WmfConfig class (duration: 13m 25s)
- 21:16 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2029.codfw.wmnet
- 21:10 tgr@deploy1003: tgr, krinkle: Continuing with sync
- 21:08 tgr@deploy1003: tgr, krinkle: Backport for multiversion: Move remaining dblist helper to WmfConfig class synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P75935 and previous config saved to /var/cache/conftool/dbconfig/20250512-210650-fceratto.json
- 21:03 tgr@deploy1003: Started scap sync-world: Backport for multiversion: Move remaining dblist helper to WmfConfig class
- 20:53 tgr@deploy1003: Finished scap sync-world: Backport for mc: remove unused "memcached-pecl" definition from wgObjectCaches (T371378) (duration: 17m 27s)
- 20:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T392806)', diff saved to https://phabricator.wikimedia.org/P75934 and previous config saved to /var/cache/conftool/dbconfig/20250512-205143-fceratto.json
- 20:46 tgr@deploy1003: tgr, krinkle: Continuing with sync
- 20:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T392806)', diff saved to https://phabricator.wikimedia.org/P75933 and previous config saved to /var/cache/conftool/dbconfig/20250512-204336-fceratto.json
- 20:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 20:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 20:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T392806)', diff saved to https://phabricator.wikimedia.org/P75932 and previous config saved to /var/cache/conftool/dbconfig/20250512-204253-fceratto.json
- 20:40 tgr@deploy1003: tgr, krinkle: Backport for mc: remove unused "memcached-pecl" definition from wgObjectCaches (T371378) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:35 tgr@deploy1003: Started scap sync-world: Backport for mc: remove unused "memcached-pecl" definition from wgObjectCaches (T371378)
- 20:31 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts lvs3009.esams.wmnet
- 20:30 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
- 20:30 dr0ptp4kt@deploy1003: Finished scap sync-world: Backport for Stream config for edge uniques on prod cluster (T391959) (duration: 18m 53s)
- 20:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P75931 and previous config saved to /var/cache/conftool/dbconfig/20250512-202746-fceratto.json
- 20:23 dr0ptp4kt@deploy1003: dr0ptp4kt: Continuing with sync
- 20:16 dr0ptp4kt@deploy1003: dr0ptp4kt: Backport for Stream config for edge uniques on prod cluster (T391959) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:14 sukhe@dns1004: END - running authdns-update
- 20:13 sukhe@dns1004: START - running authdns-update
- 20:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P75930 and previous config saved to /var/cache/conftool/dbconfig/20250512-201240-fceratto.json
- 20:11 dr0ptp4kt@deploy1003: Started scap sync-world: Backport for Stream config for edge uniques on prod cluster (T391959)
- 20:11 bearloga@deploy1003: Finished deploy [airflow-dags/analytics_product@17f8417]: (no justification provided) (duration: 00m 53s)
- 20:10 bearloga@deploy1003: Started deploy [airflow-dags/analytics_product@17f8417]: (no justification provided)
- 19:58 bking@dns1004: START - running authdns-update
- 19:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T392806)', diff saved to https://phabricator.wikimedia.org/P75929 and previous config saved to /var/cache/conftool/dbconfig/20250512-195732-fceratto.json
- 19:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) search-chi.svc.eqiad.wmnet on all recursors
- 19:49 bking@cumin2002: START - Cookbook sre.dns.wipe-cache search-chi.svc.eqiad.wmnet on all recursors
- 19:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T392806)', diff saved to https://phabricator.wikimedia.org/P75928 and previous config saved to /var/cache/conftool/dbconfig/20250512-194933-fceratto.json
- 19:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 19:40 bking@dns1004: START - running authdns-update
- 19:34 bking@dns1004: START - running authdns-update
- 19:20 jgleeson: payments-wiki upgraded from fac09775 to 92a8cbb8
- 18:46 dwisehaupt@dns1004: END - running authdns-update
- 18:45 dwisehaupt@dns1004: START - running authdns-update
- 18:37 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_ulsfo
- 18:35 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_ulsfo
- 18:01 cmooney@dns2005: END - running authdns-update
- 18:00 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:59 cmooney@dns2005: START - running authdns-update
- 17:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1070.eqiad.wmnet with OS bullseye
- 17:58 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 17:51 cmooney@dns2005: START - running authdns-update
- 17:38 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:38 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate dns recrods for new codfw switches - cmooney@cumin1002"
- 17:38 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate dns recrods for new codfw switches - cmooney@cumin1002"
- 17:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 17:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1070.eqiad.wmnet with reason: host reimage
- 17:28 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1070.eqiad.wmnet with reason: host reimage
- 17:25 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:25 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:16 krinkle@deploy1003: Finished scap sync-world: Backport for tests: Remove one-off test-only getDblistsUsedInSettings() and isWikiFamily(), multiversion: Update readDbListFile() calls from alias to WmfConfig, tests: Replace array_keys(wikiversions.json) with all.dblist (duration: 17m 05s)
- 17:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1070
- 17:10 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1070
- 17:10 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1070.eqiad.wmnet with OS bullseye
- 17:09 krinkle@deploy1003: krinkle: Continuing with sync
- 17:04 krinkle@deploy1003: krinkle: Backport for tests: Remove one-off test-only getDblistsUsedInSettings() and isWikiFamily(), multiversion: Update readDbListFile() calls from alias to WmfConfig, tests: Replace array_keys(wikiversions.json) with all.dblist synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:59 krinkle@deploy1003: Started scap sync-world: Backport for tests: Remove one-off test-only getDblistsUsedInSettings() and isWikiFamily(), multiversion: Update readDbListFile() calls from alias to WmfConfig, tests: Replace array_keys(wikiversions.json) with all.dblist
- 16:52 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:52 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:43 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:43 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:34 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 16:33 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 16:32 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1002.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 16:32 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin1002.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 16:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1070 to cirrussearch1070
- 16:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1070
- 16:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1070
- 16:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1070 on all recursors
- 16:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1070 on all recursors
- 16:26 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:26 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1070 to cirrussearch1070 - bking@cumin2002"
- 16:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1070 to cirrussearch1070 - bking@cumin2002"
- 16:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1069.eqiad.wmnet with OS bullseye
- 16:17 jelto: update helm311 and helm317 on contint1002 contint2002 - T387548
- 16:16 bking@cumin2002: START - Cookbook sre.dns.netbox
- 16:16 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1070 to cirrussearch1070
- 16:16 dwisehaupt@dns1004: END - running authdns-update
- 16:15 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:15 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:14 dwisehaupt@dns1004: START - running authdns-update
- 16:05 jelto: update helm311 and helm317 on deploy1003 - T387548
- 16:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1069.eqiad.wmnet with reason: host reimage
- 16:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T392806)', diff saved to https://phabricator.wikimedia.org/P75925 and previous config saved to /var/cache/conftool/dbconfig/20250512-160230-fceratto.json
- 15:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1069.eqiad.wmnet with reason: host reimage
- 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P75924 and previous config saved to /var/cache/conftool/dbconfig/20250512-154723-fceratto.json
- 15:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1069
- 15:44 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1069
- 15:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1069.eqiad.wmnet with OS bullseye
- 15:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1069 to cirrussearch1069
- 15:42 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1069
- 15:41 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1069
- 15:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1069 on all recursors
- 15:41 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1069 on all recursors
- 15:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:41 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1069 to cirrussearch1069 - bking@cumin2002"
- 15:40 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1069 to cirrussearch1069 - bking@cumin2002"
- 15:35 bking@cumin2002: START - Cookbook sre.dns.netbox
- 15:35 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1069 to cirrussearch1069
- 15:34 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_ulsfo
- 15:34 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_ulsfo
- 15:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P75922 and previous config saved to /var/cache/conftool/dbconfig/20250512-153216-fceratto.json
- 15:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1068.eqiad.wmnet with OS bullseye
- 15:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T392806)', diff saved to https://phabricator.wikimedia.org/P75921 and previous config saved to /var/cache/conftool/dbconfig/20250512-151709-fceratto.json
- 15:13 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 15:13 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 15:12 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 15:12 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 15:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T392806)', diff saved to https://phabricator.wikimedia.org/P75920 and previous config saved to /var/cache/conftool/dbconfig/20250512-151020-fceratto.json
- 15:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 15:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 15:05 volans: upgraded spicerack to v10.2.0 on cumin1002
- 15:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T392806)', diff saved to https://phabricator.wikimedia.org/P75919 and previous config saved to /var/cache/conftool/dbconfig/20250512-150454-fceratto.json
- 15:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1068.eqiad.wmnet with reason: host reimage
- 14:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1068.eqiad.wmnet with reason: host reimage
- 14:58 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 14:58 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 14:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 60 hosts
- 14:57 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 14:57 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 60 hosts
- 14:57 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 14:54 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 60 hosts with reason: surpress CirrusSearchNodeIndexingNotIncreasing alerts with CODFW is depooled
- 14:50 dancy@deploy1003: Installation of scap version "4.163.0" completed for 2 hosts
- 14:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P75918 and previous config saved to /var/cache/conftool/dbconfig/20250512-144948-fceratto.json
- 14:48 dancy@deploy1003: Installing scap version "4.163.0" for 2 host(s)
- 14:44 jelto: update helm311 and helm317 on deploy2002 - T387548
- 14:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1068
- 14:42 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1068
- 14:42 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1068.eqiad.wmnet with OS bullseye
- 14:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1068 to cirrussearch1068
- 14:40 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1068
- 14:39 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7001.magru.wmnet
- 14:39 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7001.magru.wmnet
- 14:39 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
- 14:35 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P75917 and previous config saved to /var/cache/conftool/dbconfig/20250512-143441-fceratto.json
- 14:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1068
- 14:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1068 on all recursors
- 14:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1068 on all recursors
- 14:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:27 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1068 to cirrussearch1068 - bking@cumin2002"
- 14:27 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1068 to cirrussearch1068 - bking@cumin2002"
- 14:23 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:23 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:23 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1068 to cirrussearch1068
- 14:22 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1003.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 14:21 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin1003.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 14:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T392806)', diff saved to https://phabricator.wikimedia.org/P75916 and previous config saved to /var/cache/conftool/dbconfig/20250512-141933-fceratto.json
- 14:17 tgr@deploy1003: Finished scap sync-world: Backport for Improve session logging (T393038) (duration: 17m 24s)
- 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T392806)', diff saved to https://phabricator.wikimedia.org/P75915 and previous config saved to /var/cache/conftool/dbconfig/20250512-141139-fceratto.json
- 14:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T392806)', diff saved to https://phabricator.wikimedia.org/P75914 and previous config saved to /var/cache/conftool/dbconfig/20250512-141114-fceratto.json
- 14:10 tgr@deploy1003: tgr: Continuing with sync
- 14:04 tgr@deploy1003: tgr: Backport for Improve session logging (T393038) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:04 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:01 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 13:59 tgr@deploy1003: Started scap sync-world: Backport for Improve session logging (T393038)
- 13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P75913 and previous config saved to /var/cache/conftool/dbconfig/20250512-135607-fceratto.json
- 13:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:52 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
- 13:51 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: Testing in progress
- 13:45 hashar@deploy1003: Finished deploy [integration/docroot@21bebf5]: build: Updating mediawiki/mediawiki-codesniffer to 47.0.0 (duration: 00m 11s)
- 13:45 hashar@deploy1003: Started deploy [integration/docroot@21bebf5]: build: Updating mediawiki/mediawiki-codesniffer to 47.0.0
- 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P75912 and previous config saved to /var/cache/conftool/dbconfig/20250512-134100-fceratto.json
- 13:34 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 13:34 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 13:33 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for htmlform: Fix rendering contents for cloner fields (T393790) (duration: 14m 50s)
- 13:29 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:29 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T392806)', diff saved to https://phabricator.wikimedia.org/P75911 and previous config saved to /var/cache/conftool/dbconfig/20250512-132552-fceratto.json
- 13:25 lucaswerkmeister-wmde@deploy1003: stran, lucaswerkmeister-wmde: Continuing with sync
- 13:22 lucaswerkmeister-wmde@deploy1003: stran, lucaswerkmeister-wmde: Backport for htmlform: Fix rendering contents for cloner fields (T393790) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:18 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for htmlform: Fix rendering contents for cloner fields (T393790)
- 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T392806)', diff saved to https://phabricator.wikimedia.org/P75910 and previous config saved to /var/cache/conftool/dbconfig/20250512-131756-fceratto.json
- 13:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75909 and previous config saved to /var/cache/conftool/dbconfig/20250512-131731-fceratto.json
- 13:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 13:15 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 13:14 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:14 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:12 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 13:12 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 13:08 tgr@deploy1003: Finished scap sync-world: Backport for Get rid of ancient session_name call (T124371), Do not use $_SESSION (T29887 T124371), Set wgPHPSessionHandling to 'warn' (T362324) (duration: 32m 12s)
- 13:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P75908 and previous config saved to /var/cache/conftool/dbconfig/20250512-130225-fceratto.json
- 13:01 elukey: `puppet ca destroy thanos.discovery.wmnet` on puppetmaster1001 - old cert not used anymore
- 12:59 tgr@deploy1003: tgr, mszabo: Continuing with sync
- 12:52 tgr@deploy1003: tgr, mszabo: Backport for Get rid of ancient session_name call (T124371), Do not use $_SESSION (T29887 T124371), Set wgPHPSessionHandling to 'warn' (T362324) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P75907 and previous config saved to /var/cache/conftool/dbconfig/20250512-124718-fceratto.json
- 12:36 tgr@deploy1003: Started scap sync-world: Backport for Get rid of ancient session_name call (T124371), Do not use $_SESSION (T29887 T124371), Set wgPHPSessionHandling to 'warn' (T362324)
- 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75906 and previous config saved to /var/cache/conftool/dbconfig/20250512-123211-fceratto.json
- 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75905 and previous config saved to /var/cache/conftool/dbconfig/20250512-122626-fceratto.json
- 12:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T392806)', diff saved to https://phabricator.wikimedia.org/P75904 and previous config saved to /var/cache/conftool/dbconfig/20250512-122600-fceratto.json
- 12:25 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 12:24 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 12:18 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 12:18 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 12:18 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P75903 and previous config saved to /var/cache/conftool/dbconfig/20250512-121053-fceratto.json
- 11:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P75902 and previous config saved to /var/cache/conftool/dbconfig/20250512-115545-fceratto.json
- 11:45 jgleeson: civicrm upgraded from dc096105 to 852c6ee6
- 11:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T392806)', diff saved to https://phabricator.wikimedia.org/P75901 and previous config saved to /var/cache/conftool/dbconfig/20250512-114038-fceratto.json
- 11:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T392806)', diff saved to https://phabricator.wikimedia.org/P75900 and previous config saved to /var/cache/conftool/dbconfig/20250512-113350-fceratto.json
- 11:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 11:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T392806)', diff saved to https://phabricator.wikimedia.org/P75899 and previous config saved to /var/cache/conftool/dbconfig/20250512-113324-fceratto.json
- 11:25 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
- 11:22 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
- 11:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P75898 and previous config saved to /var/cache/conftool/dbconfig/20250512-111817-fceratto.json
- 11:17 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
- 11:16 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
- 11:12 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.0 - volans@cumin1003
- 11:11 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.0 - volans@cumin1003
- 11:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 11:08 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 11:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 11:03 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1088.eqiad.wmnet with OS bullseye
- 11:03 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 11:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P75897 and previous config saved to /var/cache/conftool/dbconfig/20250512-110310-fceratto.json
- 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T392806)', diff saved to https://phabricator.wikimedia.org/P75896 and previous config saved to /var/cache/conftool/dbconfig/20250512-104803-fceratto.json
- 10:47 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
- 10:44 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
- 10:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T392806)', diff saved to https://phabricator.wikimedia.org/P75895 and previous config saved to /var/cache/conftool/dbconfig/20250512-104116-fceratto.json
- 10:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 10:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 10:32 XioNoX: delete some exterminated cables from Netbox - T393188
- 10:31 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1088.eqiad.wmnet with OS bullseye
- 10:22 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 10:22 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 10:08 Ammar: Ran fixStuckGlobalRename.php for T393877
- 09:36 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 09:25 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 09:04 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thanos-fe[2001-2003].codfw.wmnet
- 09:04 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:04 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-fe[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1002"
- 09:03 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-fe[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1002"
- 09:00 mvernon@cumin1002: START - Cookbook sre.dns.netbox
- 08:55 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:55 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:50 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts thanos-fe[2001-2003].codfw.wmnet
- 08:49 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts thanos-be[2001-2003].codfw.wmnet
- 08:48 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts thanos-be[2001-2003].codfw.wmnet
- 08:47 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on P{thanos-fe200[4-7]*} or P{thanos-fe1*} and (A:thanos-fe or A:thanos-fe-codfw or A:thanos-fe-eqiad)
- 08:43 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on P{thanos-fe200[4-7]*} or P{thanos-fe1*} and (A:thanos-fe or A:thanos-fe-codfw or A:thanos-fe-eqiad)
- 08:39 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on A:thanos-fe
- 08:39 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
- 08:35 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 08:34 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 08:33 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 08:31 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 08:29 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=apus,name=apus-fe1003.eqiad.wmnet
- 08:29 mvernon@cumin1002: conftool action : set/weight=40; selector: service=apus,name=apus-fe1003.eqiad.wmnet
- 08:10 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 08:09 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 07:57 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Cicalese out of all services on: 2402 hosts
- 07:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:12 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Debt out of all services on: 2402 hosts
2025-05-11
- 22:55 tchin@deploy1003: Finished deploy [airflow-dags/analytics@301c74b]: Deploying airflow artifacts for T384962 (duration: 02m 01s)
- 22:54 tchin@deploy1003: Started deploy [airflow-dags/analytics@301c74b]: Deploying airflow artifacts for T384962
2025-05-10
- 00:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 00:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 00:41 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 00:41 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 00:41 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 00:41 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 00:23 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 00:22 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 00:22 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 00:22 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 00:22 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 00:22 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 00:16 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 00:16 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 00:16 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 00:15 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 00:15 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 00:15 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
2025-05-09
- 23:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 22:10 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 22:09 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:03 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:57 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 21:05 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 20:53 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from elastic1068 to cirrussearch1068
- 20:52 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 20:50 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1068 to cirrussearch1068
- 20:46 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on elastic1054.eqiad.wmnet with reason: downtime prior to decom
- 20:39 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1006.eqiad.wmnet with OS bullseye
- 20:39 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 20:35 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 20:30 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1053.eqiad.wmnet with OS bullseye
- 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1053
- 20:23 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1053
- 20:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1053.eqiad.wmnet with OS bullseye
- 20:20 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1053.eqiad.wmnet with OS bullseye
- 20:18 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1006.eqiad.wmnet with reason: host reimage
- 20:15 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1006.eqiad.wmnet with reason: host reimage
- 20:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1053
- 20:14 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1053
- 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1053.eqiad.wmnet with OS bullseye
- 20:11 jgreen@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:09 jgreen@cumin1002: START - Cookbook sre.dns.netbox
- 20:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1053 to cirrussearch1053
- 20:07 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1053
- 20:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1053
- 20:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1053 on all recursors
- 20:05 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1053 on all recursors
- 20:05 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:05 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1053 to cirrussearch1053 - bking@cumin2002"
- 20:04 inflatador: bking@cumin2002 removed unrelated `fran1001` DNS record during a rename
- 20:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1053 to cirrussearch1053 - bking@cumin2002"
- 20:00 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:00 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1053 to cirrussearch1053
- 19:55 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1006.eqiad.wmnet with OS bullseye
- 19:50 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
- 19:50 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
- 19:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1006.eqiad.wmnet with OS bullseye
- 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
- 19:45 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
- 19:24 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 19:06 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:56 ryankemper@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs1012.eqiad.wmnet|wdqs1013.eqiad.wmnet|wdqs1014.eqiad.wmnet|wdqs1015.eqiad.wmnet|wdqs2007.codfw.wmnet|wdqs2010.codfw.wmnet|wdqs2011.codfw.wmnet|wdqs2012.codfw.wmnet|wdqs2013.codfw.wmnet
- 18:28 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1006.eqiad.wmnet with OS bullseye
- 18:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:24 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe1007
- 18:24 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:23 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe1007
- 18:23 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:23 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1007 - vriley@cumin1002"
- 18:22 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1007 - vriley@cumin1002"
- 18:19 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 18:17 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:16 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe1006
- 18:15 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe1006
- 18:14 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:14 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1006 - vriley@cumin1002"
- 18:14 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1006 - vriley@cumin1002"
- 18:11 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 17:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 17:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:21 krinkle@deploy1003: Finished scap sync-world: Backport for noc: Fix "Class MWMultiVersion not found" in wiki.php (duration: 13m 42s)
- 16:20 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@bfb9c63]: bump image suggestions to 1.6.0 (duration: 01m 49s)
- 16:19 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@bfb9c63]: bump image suggestions to 1.6.0
- 16:14 krinkle@deploy1003: krinkle: Continuing with sync
- 16:14 krinkle@deploy1003: krinkle: Backport for noc: Fix "Class MWMultiVersion not found" in wiki.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:07 krinkle@deploy1003: Started scap sync-world: Backport for noc: Fix "Class MWMultiVersion not found" in wiki.php
- 15:57 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from elastic1053 to cirrussearch1053
- 15:57 bking@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 15:57 bking@cumin2002: START - Cookbook sre.dns.netbox
- 15:57 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1053 to cirrussearch1053
- 15:49 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.rename (exit_code=93) from elastic1053 to cirrussearch1053
- 15:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1053 to cirrussearch1053
- 15:41 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) (duration: 15m 22s)
- 15:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:34 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
- 15:32 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:25 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641)
- 14:30 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 38s)
- 14:29 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 14:25 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 31s)
- 14:24 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 14:21 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) (duration: 14m 12s)
- 14:15 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
- 14:14 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:07 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641)
- 13:36 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 04m 10s)
- 13:32 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 12:51 godog: upload prometheus-blackbox-exporter 0.26.0-0~bpo12+1 to bookworm-wikimedia - T385022
- 11:45 taavi: update toolforge arc-enabled exim4 packages (component/exim4-arc) to latest in debian 12 T356171
- 11:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 11:02 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1005.eqiad.wmnet with OS bullseye
- 11:02 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 10:58 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 10:40 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1005.eqiad.wmnet with reason: host reimage
- 10:37 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1005.eqiad.wmnet with reason: host reimage
- 10:20 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1005.eqiad.wmnet with OS bullseye
- 09:50 moritzm: imported debmonitor-client 0.4.0-3+deb13u1 for trixie-wikimedia T391083
- 09:05 zabe: zabe@deploy1003:~$ mwscript-k8s --comment="T393761" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=amwiki --logwiki=metawiki 'Jeroen' 'Retireduser-vfs199s31yvbtxsfmygg'
- 09:03 zabe: zabe@deploy1003:~$ mwscript-k8s --comment="T393372" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwikibooks --logwiki=metawiki 'Adityaindumdum' 'Renamed user a71c8354dc822ea0d3aab24d1ce886f02c25fe91'
- 08:17 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2013.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 08:10 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1003.eqiad.wmnet with reason: Release v0.9.0 - volans@cumin2002
- 08:09 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin1003.eqiad.wmnet with reason: Release v0.9.0 - volans@cumin2002
- 07:57 moritzm: imported puppet-agent 7.23.0-1+wmf13u1 to component/puppet7 for trixie-wikimedia T392790
- 07:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2013.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 07:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2012.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 07:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 07:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1013.eqiad.wmnet -> wdqs1015.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 07:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1014.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 07:15 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1013.eqiad.wmnet -> wdqs1015.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1014.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 06:26 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2012.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 06:26 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 06:26 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 06:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 06:10 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2010.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 06:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1013.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 05:30 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on cumin1003.eqiad.wmnet with reason: WIP new Bookworm host
- 05:12 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1013.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 05:12 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2010.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 05:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 04:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 04:03 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 06s)
- 04:03 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 04:03 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
- 04:03 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 04:02 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
- 04:02 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 04:02 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
- 04:02 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 04:01 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
- 04:01 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 00:07 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_drmrs
- 00:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_drmrs
2025-05-08
- 23:37 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2012.codfw.wmnet
- 23:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1005.eqiad.wmnet with OS bullseye
- 23:35 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2011.codfw.wmnet
- 23:35 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1014.eqiad.wmnet
- 23:34 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2010.codfw.wmnet
- 23:30 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1015.eqiad.wmnet
- 23:26 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1013.eqiad.wmnet
- 23:22 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2013.codfw.wmnet
- 23:19 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2007.codfw.wmnet
- 23:06 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1012.eqiad.wmnet
- 22:28 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1012.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 22:27 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 22:17 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1005.eqiad.wmnet with OS bullseye
- 22:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2047.codfw.wmnet with OS bookworm
- 21:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2048.codfw.wmnet with OS bookworm
- 21:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:50 tzatziki: removing 1 file for legal compliance
- 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2013.codfw.wmnet
- 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2012.codfw.wmnet
- 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2011.codfw.wmnet
- 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2010.codfw.wmnet
- 21:47 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe1005
- 21:45 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe1005
- 21:44 tzatziki: removing 3 files for legal compliance
- 21:44 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1015.eqiad.wmnet
- 21:44 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:44 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1005 - vriley@cumin1002"
- 21:44 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1014.eqiad.wmnet
- 21:43 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1005 - vriley@cumin1002"
- 21:43 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1013.eqiad.wmnet
- 21:43 ryankemper: T388134 Cutover completed about an hour ago. Metrics look good; we're in the process of shifting over some of the old `wdqs` hosts to `wdqs-main` to increase capacity
- 21:40 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 21:38 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on wdqs[2007,2013].codfw.wmnet,wdqs[1012-1014].eqiad.wmnet with reason: bringing hosts online with a data transfer
- 21:35 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1247.eqiad.wmnet with reason: Host has crashed - T393612
- 21:34 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 21:33 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2007.codfw.wmnet
- 21:29 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1012.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 21:29 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1012.eqiad.wmnet
- 20:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_drmrs
- 20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_drmrs
- 20:54 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp50[19-24].eqsin.wmnet} and A:cp
- 20:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:49 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp50[27-32].eqsin.wmnet} and A:cp
- 20:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2047.codfw.wmnet with reason: host reimage
- 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2048.codfw.wmnet with reason: host reimage
- 20:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2047.codfw.wmnet with reason: host reimage
- 20:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2048.codfw.wmnet with reason: host reimage
- 20:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
- 20:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
- 20:16 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe1003.eqiad.wmnet with OS bookworm
- 20:16 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 20:14 ryankemper: T388134 Beginning cutover of query.wikidata.org from `wdqs` to `wdqs-main`. Starting to see requests increase on wdqs-main (and decrease on wdqs) as expected. Rolling change to rest of cp text hosts. Traffic should be fully moved over in ~20 mins
- 20:03 swfrench@deploy1003: Stopping before sync operations
- 20:03 swfrench@deploy1003: Started scap sync-world: Non-deploy scap run to switch mw-script/main to PHP 8.1 - T391057
- 19:30 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 19:08 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe1003.eqiad.wmnet with reason: host reimage
- 19:04 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe1003.eqiad.wmnet with reason: host reimage
- 19:01 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.28 refs T386223
- 18:48 sukhe@dns1004: END - running authdns-update
- 18:46 sukhe@dns1004: START - running authdns-update
- 18:45 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
- 18:45 zabe: move all translateable subpages of "Wikimedia Foundation Board of Trustees" to subpages of "Wikimedia Foundation/Board of Trustees" on metawiki (T393619)
- 18:43 zabe: mwscript-k8s [...]moveTranslatableBundle.php metawiki "Wikimedia Foundation Board of Trustees/Call for feedback: Board of Trustees elections" "Wikimedia Foundation/Board of Trustees/Call for feedback: Board of Trustees elections" "Zabe" --reason "per request T393619"
- 18:42 zabe: zabe@deploy1003:~$ mwscript-k8s --attach -- extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Wikimedia Foundation Board of Trustees/Call for feedback: Board of Trustees elections" "Wikimedia Foundation/Board of Trustees/Call for feedback: Board of Trustees elections" "Zabe" --reason "per request
- 18:38 zabe: zabe@deploy1003:~$ mwscript-k8s --attach -- extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Wikimedia Foundation Board of Trustees/Call for feedback:2022 Board of Trustees election/Upcoming Call for Feedback about the Board of Trustees elections" "Wikimedia Foundation/Board of Trustees/Call for feedback:2022 Board of
- 18:30 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp50[27-32].eqsin.wmnet} and A:cp
- 18:29 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp50[19-24].eqsin.wmnet} and A:cp
- 18:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2048.codfw.wmnet with OS bookworm
- 18:03 dancy@deploy1003: Installation of scap version "4.162.0" completed for 2 hosts
- 18:01 dancy@deploy1003: Installing scap version "4.162.0" for 2 host(s)
- 17:38 cdanis@dns1004: END - running authdns-update
- 17:36 cdanis@dns1004: START - running authdns-update
- 17:28 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=97) rolling upgrade of Varnish on A:cp-text_eqsin
- 17:28 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=97) rolling upgrade of Varnish on A:cp-upload_eqsin
- 17:25 cdanis@dns1004: END - running authdns-update
- 17:23 cdanis@dns1004: START - running authdns-update
- 17:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2047.codfw.wmnet with OS bookworm
- 17:12 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch1112.eqiad.wmnet|cirrussearch1113.eqiad.wmnet|cirrussearch1114.eqiad.wmnet|cirrussearch1115.eqiad.wmnet|cirrussearch1116.eqiad.wmnet|cirrussearch1117.eqiad.wmnet|cirrussearch1118.eqiad.wmnet|cirrussearch1119.eqiad.wmnet|cirrussearch1120.eqiad.wmnet|cirrussearch1121.eqiad.wmnet|cirrussearch1122.eqiad.wmnet|cirrussearch1123.eqiad.wmn
- 17:09 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch1111.eqiad.wmnet|name=cirrussearch1112.eqiad.wmnet|name=cirrussearch1113.eqiad.wmnet|name=cirrussearch1114.eqiad.wmnet|name=cirrussearch1115.eqiad.wmnet|name=cirrussearch1116.eqiad.wmnet|name=cirrussearch1117.eqiad.wmnet|name=cirrussearch1118.eqiad.wmnet|name=cirrussearch1119.eqiad.wmnet|name=cirrussearch1120.eqiad.wmnet|name=cirru
- 17:06 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:05 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:05 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:05 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:05 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:04 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 16:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqsin
- 16:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqsin
- 16:49 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
- 16:48 fabfur: repooling cp7001 (T393671)
- 16:48 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7001.magru.wmnet
- 16:48 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7001.magru.wmnet
- 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
- 16:29 brett@dns1005: END - running authdns-update
- 16:28 brett@dns1005: START - running authdns-update
- 16:27 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
- 16:22 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: Host has crashed - T393296
- 16:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2048
- 16:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2048
- 16:11 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2048 to codfw - jhancock@cumin2002"
- 16:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2048 to codfw - jhancock@cumin2002"
- 16:09 sukhe@dns1004: END - running authdns-update
- 16:08 sukhe@dns1004: START - running authdns-update
- 16:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
- 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
- 15:46 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
- 15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
- 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
- 15:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:07 sukhe: sudo cumin -b1 -s10 'A:dnsbox' 'run-puppet-agent'
- 15:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
- 14:57 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
- 14:45 moritzm: imported ripe-atlas-tools 2.3.0-3+wmf12u1 to apt.wikimedia.org/bookworm T389380
- 14:45 moritzm: imported ripe-atlas-sagan 1.3.1-1~wmf12u1 to apt.wikimedia.org/bookworm T389380
- 14:36 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bookworm
- 14:34 James_F: Running `foreachwiki extensions/Echo/maintenance/removeInvalidNotification.php --remove # T389673` for MatmaRex
- 14:23 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase[1028-1030].eqiad.wmnet
- 14:23 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:23 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[1028-1030].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
- 14:21 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 14:21 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 14:21 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 14:20 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 14:20 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 14:20 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[1028-1030].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
- 14:20 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 14:14 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
- 14:12 pt1979@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
- 14:03 eevans@cumin1002: START - Cookbook sre.dns.netbox
- 13:52 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bookworm
- 13:51 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts restbase[1028-1030].eqiad.wmnet
- 13:42 volans: forced removal of db1246 from puppetdb to unblock reimage (was failing due to a puppet change in the meanwhile)
- 13:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 13:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 13:34 tchanders@deploy1003: Finished scap sync-world: Backport for temp accounts: Remove AutopromoteOnce configuration (T393358) (duration: 16m 30s)
- 13:27 tchanders@deploy1003: tchanders, kharlan: Continuing with sync
- 13:24 tchanders@deploy1003: tchanders, kharlan: Backport for temp accounts: Remove AutopromoteOnce configuration (T393358) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:17 tchanders@deploy1003: Started scap sync-world: Backport for temp accounts: Remove AutopromoteOnce configuration (T393358)
- 13:03 moritzm: installing jetty9 security updates
- 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
- 12:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
- 12:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
- 12:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
- 11:57 moritzm: import transferpy 1.1+deb12u1 to bookworm-wikimedia T389380
- 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
- 11:44 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
- 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
- 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
- 11:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
- 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
- 11:19 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
- 11:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
- 11:15 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
- 11:15 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
- 10:45 zabe: zabe@deploy1003:~$ mwscript-k8s --attach -- extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Wikimedia Foundation Board of Trustees" "Wikimedia Foundation/Board of Trustees" "Zabe" --reason "per request T393619"
- 10:31 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 10:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 09:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 09:26 Emperor: swift delete wikipedia-commons-local-public.e7 'e/e7/Hawkmoth_(Meganoton_nyctiphanes)_(8688240817).jpg' ms-fe1009 and ms-fe2009 T392658
- 09:02 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-fe1003.eqiad.wmnet with OS bookworm
- 08:53 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
- 08:52 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-fe1003.eqiad.wmnet with OS bookworm
- 08:47 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
- 08:37 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: Testing in progress
- 08:19 dcausse: closing UTC morning backport window
- 08:12 dcausse@deploy1003: Finished scap sync-world: Backport for cirrus: explicitly route search traffic to codfw (T388610) (duration: 23m 19s)
- 08:06 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
- 08:05 fabfur: depooling and disabling puppet on cp7001 to perform tests (T393671)
- 08:03 dcausse@deploy1003: dcausse: Continuing with sync
- 07:56 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 29s)
- 07:55 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 07:55 dcausse@deploy1003: dcausse: Backport for cirrus: explicitly route search traffic to codfw (T388610) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:52 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 42s)
- 07:51 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 07:49 dcausse@deploy1003: Started scap sync-world: Backport for cirrus: explicitly route search traffic to codfw (T388610)
- 07:46 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 05m 42s)
- 07:40 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 07:40 fab@deploy1003: Finished deploy [airflow-dags/research@4367417]: (no justification provided) (duration: 00m 40s)
- 07:39 fab@deploy1003: Started deploy [airflow-dags/research@4367417]: (no justification provided)
- 07:06 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2034.codfw.wmnet
- 07:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 07:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 06:56 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
- 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
- 06:54 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 06:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
- 06:47 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
- 06:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
- 01:52 tstarling@deploy1003: Finished scap sync-world: Backport for Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601), Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601) (duration: 46m 12s)
- 01:38 tstarling@deploy1003: tstarling: Continuing with sync
- 01:37 tstarling@deploy1003: tstarling: Backport for Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601), Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 01:06 tstarling@deploy1003: Started scap sync-world: Backport for Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601), Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601)
- 00:14 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_esams
- 00:09 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_esams
2025-05-07
- 21:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839), ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839) (duration: 14m 12s)
- 21:29 ejegg: payments-wiki upgraded from 822bac34 to fac09775
- 21:27 ladsgroup@deploy1003: ladsgroup, sbisson: Continuing with sync
- 21:26 ladsgroup@deploy1003: ladsgroup, sbisson: Backport for ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839), ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1124.eqiad.wmnet with OS bullseye
- 21:19 ladsgroup@deploy1003: Started scap sync-world: Backport for ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839), ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839)
- 21:06 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_esams
- 21:06 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_esams
- 21:05 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_magru
- 21:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for Charts phase 1 deployment (T393517), Clear floats to avoid tall charts (T393286), Clear floats to avoid tall charts (T393286) (duration: 17m 21s)
- 21:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1124.eqiad.wmnet with reason: host reimage
- 21:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1124.eqiad.wmnet with reason: host reimage
- 20:59 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_magru
- 20:56 ladsgroup@deploy1003: jdlrobson, bvibber, ladsgroup: Continuing with sync
- 20:55 ladsgroup@deploy1003: jdlrobson, bvibber, ladsgroup: Backport for Charts phase 1 deployment (T393517), Clear floats to avoid tall charts (T393286), Clear floats to avoid tall charts (T393286) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1124.eqiad.wmnet with OS bullseye
- 20:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1124.eqiad.wmnet with OS bullseye
- 20:48 ladsgroup@deploy1003: Started scap sync-world: Backport for Charts phase 1 deployment (T393517), Clear floats to avoid tall charts (T393286), Clear floats to avoid tall charts (T393286)
- 20:46 ladsgroup@deploy1003: Finished scap sync-world: Backport for Remove whatlinkshere hook (T393513), Improve circuit breaking error message (T360930), Remove hard-coded timestamps in SpecialGlobalContributionsTest (T393531) (duration: 41m 41s)
- 20:33 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 20:33 ladsgroup@deploy1003: ladsgroup: Backport for Remove whatlinkshere hook (T393513), Improve circuit breaking error message (T360930), Remove hard-coded timestamps in SpecialGlobalContributionsTest (T393531) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:28 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 20:26 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs3009*} and A:liberica (T393616)
- 20:26 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs3009*} and A:liberica (T393616)
- 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 20:23 sukhe: depooling lvs3009 for HW maint: T393616
- 20:04 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513), Improve circuit breaking error message (T360930), Remove hard-coded timestamps in SpecialGlobalContributionsTest (T393531)
- 19:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.eqiad.wmnet with OS bookworm
- 18:55 hmonroy@deploy1003: Finished scap sync-world: Backport for Enable Codex and Multiblocks in Hebrew wiki (T377121) (duration: 17m 21s)
- 18:49 hmonroy@deploy1003: hmonroy: Continuing with sync
- 18:45 hmonroy@deploy1003: hmonroy: Backport for Enable Codex and Multiblocks in Hebrew wiki (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1125.eqiad.wmnet with OS bullseye
- 18:38 hmonroy@deploy1003: Started scap sync-world: Backport for Enable Codex and Multiblocks in Hebrew wiki (T377121)
- 18:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:31 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 18:30 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 18:29 dancy@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.28 refs T386223
- 18:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1125.eqiad.wmnet with reason: host reimage
- 18:21 volans: uploaded spicerack_10.2.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
- 18:20 aokoth@dns1004: END - running authdns-update
- 18:19 aokoth@dns1004: START - running authdns-update
- 18:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1125.eqiad.wmnet with reason: host reimage
- 18:14 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
- 18:13 dancy@deploy1003: Finished scap build-images: (no justification provided) (duration: 00m 30s)
- 18:12 dancy@deploy1003: Started scap build-images: (no justification provided)
- 18:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1125.eqiad.wmnet with OS bullseye
- 18:06 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1124.eqiad.wmnet with OS bullseye
- 18:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1123.eqiad.wmnet with OS bullseye
- 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1125 to cirrussearch1125
- 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1122.eqiad.wmnet with OS bullseye
- 18:01 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1125
- 17:53 ladsgroup@deploy1003: Finished scap sync-world: Backport for Remove whatlinkshere hook (T393513) (duration: 36m 00s)
- 17:52 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:40 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 17:37 ladsgroup@deploy1003: ladsgroup: Backport for Remove whatlinkshere hook (T393513) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1123.eqiad.wmnet with reason: host reimage
- 17:35 swfrench-wmf: deploy1003 and deploy2002 updated to PHP 8.1 - T392938
- 17:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1122.eqiad.wmnet with reason: host reimage
- 17:34 vriley@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:31 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1123.eqiad.wmnet with reason: host reimage
- 17:29 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1125
- 17:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1125 on all recursors
- 17:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1125 on all recursors
- 17:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:29 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1125 to cirrussearch1125 - bking@cumin2002"
- 17:29 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1125 to cirrussearch1125 - bking@cumin2002"
- 17:28 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 17:26 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1122.eqiad.wmnet with reason: host reimage
- 17:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1124 to cirrussearch1124
- 17:24 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_magru
- 17:24 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1124
- 17:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1124
- 17:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1124 on all recursors
- 17:23 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:23 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1124 on all recursors
- 17:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:23 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1124 to cirrussearch1124 - bking@cumin2002"
- 17:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1124 to cirrussearch1124 - bking@cumin2002"
- 17:20 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1125 to cirrussearch1125
- 17:19 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:17 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513)
- 17:17 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1124 to cirrussearch1124
- 17:16 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe1003
- 17:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1123.eqiad.wmnet with OS bullseye
- 17:15 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe1003
- 17:14 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:13 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_magru
- 17:13 swfrench-wmf: disable-puppet "In-place update to PHP 8.1 - T392938" on deploy1003 and deploy2002
- 17:11 vriley@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1123 to cirrussearch1123
- 17:08 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1123
- 17:08 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1122.eqiad.wmnet with OS bullseye
- 17:08 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1123
- 17:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1123 on all recursors
- 17:08 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1123 on all recursors
- 17:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:08 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1123 to cirrussearch1123 - bking@cumin2002"
- 17:08 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1123 to cirrussearch1123 - bking@cumin2002"
- 17:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1122 to cirrussearch1122
- 17:07 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1122
- 17:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1122
- 17:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1122 on all recursors
- 17:06 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1122 on all recursors
- 17:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:06 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1122 to cirrussearch1122 - bking@cumin2002"
- 17:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1122 to cirrussearch1122 - bking@cumin2002"
- 17:04 bking@cumin2002: START - Cookbook sre.dns.netbox
- 16:58 cdanis: per dwisehaupt T196336 💙cdanis@alert1002.wikimedia.org ~ 🕐☕ sudo systemctl restart nsca.service
- 16:58 bking@cumin2002: START - Cookbook sre.dns.netbox
- 16:56 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1123 to cirrussearch1123
- 16:56 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1122 to cirrussearch1122
- 16:43 ladsgroup@deploy1003: sync-world aborted: Backport for Remove whatlinkshere hook (T393513) (duration: 06m 07s)
- 16:36 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513)
- 16:36 ladsgroup@deploy1003: sync-world aborted: Backport for Remove whatlinkshere hook (T393513) (duration: 29m 10s)
- 16:31 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 16:31 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 16:31 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:30 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1121.eqiad.wmnet with OS bullseye
- 16:09 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:09 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:07 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 16:07 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 16:07 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513)
- 15:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1120.eqiad.wmnet with OS bullseye
- 15:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1121.eqiad.wmnet with reason: host reimage
- 15:53 moritzm: uploaded a python-pynetbox 7.4.1-1~wmf12u1 to bookworm-wikimedia (needed for Cumin update) T389380
- 15:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1121.eqiad.wmnet with reason: host reimage
- 15:49 zabe: zabe@mwmaint1002:~$ mwscript findBadBlobs.php enwiki --revisions 276146284,819689534,1289169661 --mark "T393237"
- 15:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1119.eqiad.wmnet with OS bullseye
- 15:43 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1247.eqiad.wmnet with reason: Host has crashed - T393612
- 15:40 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1121.eqiad.wmnet with OS bullseye
- 15:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1121 to cirrussearch1121
- 15:39 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1121
- 15:38 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ms-be1060.eqiad.wmnet
- 15:38 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:37 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1121
- 15:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1121 on all recursors
- 15:37 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1121 on all recursors
- 15:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1121 to cirrussearch1121 - bking@cumin2002"
- 15:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1120.eqiad.wmnet with reason: host reimage
- 15:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1121 to cirrussearch1121 - bking@cumin2002"
- 15:36 mvernon@cumin1002: START - Cookbook sre.dns.netbox
- 15:32 cdanis@cumin1002: dbctl commit (dc=all): 'depool db1247', diff saved to https://phabricator.wikimedia.org/P75876 and previous config saved to /var/cache/conftool/dbconfig/20250507-153228-cdanis.json
- 15:32 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 15:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1120.eqiad.wmnet with reason: host reimage
- 15:31 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 15:31 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 15:31 bking@cumin2002: START - Cookbook sre.dns.netbox
- 15:30 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 15:30 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:30 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:30 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 15:30 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 15:30 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:30 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:29 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:29 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:29 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:28 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1121 to cirrussearch1121
- 15:26 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts ms-be1060.eqiad.wmnet
- 15:21 damilare: civicrm upgraded from 6ffbde61 to dc096105
- 15:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1118.eqiad.wmnet with OS bullseye
- 15:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1119.eqiad.wmnet with reason: host reimage
- 15:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1120.eqiad.wmnet with OS bullseye
- 15:10 sukhe@dns1004: END - running authdns-update
- 15:10 sukhe: timing authdns-update for T393602
- 15:09 sukhe@dns1004: START - running authdns-update
- 15:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1119.eqiad.wmnet with reason: host reimage
- 15:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1120 to cirrussearch1120
- 15:08 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1120
- 15:08 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1060*,elastic1081*,elastic1083* for thread pool rejections - bking@cumin2002
- 15:08 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1060*,elastic1081*,elastic1083* for thread pool rejections - bking@cumin2002
- 15:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1120
- 15:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1120 on all recursors
- 15:06 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1120 on all recursors
- 15:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:06 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1120 to cirrussearch1120 - bking@cumin2002"
- 15:06 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1120 to cirrussearch1120 - bking@cumin2002"
- 15:06 sukhe: sudo cumin -b1 -s10 'A:dnsbox' 'sudo -u authdns git -C /srv/authdns/git maintenance run' T393602
- 15:05 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1016.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1016.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1016.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1016.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1015.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1015.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1015.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1015.eqiad.wmnet
- 15:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1060*,elastic1081* for thread pool rejections - bking@cumin2002
- 15:04 sukhe@dns1004: END - running authdns-update
- 15:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1060*,elastic1081* for thread pool rejections - bking@cumin2002
- 15:04 Emperor: pool ms-fe1015 ms-fe1016 new frontends T388886 T391354
- 15:02 sukhe@dns1004: START - running authdns-update
- 15:00 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:59 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1081* for thread pool rejections - bking@cumin2002
- 14:59 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1081* for thread pool rejections - bking@cumin2002
- 14:58 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1120 to cirrussearch1120
- 14:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1119.eqiad.wmnet with OS bullseye
- 14:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1119 to cirrussearch1119
- 14:47 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1119
- 14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1118.eqiad.wmnet with reason: host reimage
- 14:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1117.eqiad.wmnet with OS bullseye
- 14:40 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1119
- 14:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1119 on all recursors
- 14:40 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1119 on all recursors
- 14:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:40 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1119 to cirrussearch1119 - bking@cumin2002"
- 14:39 moritzm: installing openjdk-17 security updates
- 14:39 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1118.eqiad.wmnet with reason: host reimage
- 14:33 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1119 to cirrussearch1119 - bking@cumin2002"
- 14:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1116.eqiad.wmnet with OS bullseye
- 14:26 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:26 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1119 to cirrussearch1119
- 14:15 gengh@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1117.eqiad.wmnet with reason: host reimage
- 14:15 gengh@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:14 gengh@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:14 gengh@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:13 gengh@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:12 gengh@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1116.eqiad.wmnet with reason: host reimage
- 14:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1117.eqiad.wmnet with reason: host reimage
- 14:09 sukhe@dns1004: END - running authdns-update
- 14:09 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1118.eqiad.wmnet with OS bullseye
- 14:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1116.eqiad.wmnet with reason: host reimage
- 14:08 gengh@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:07 gengh@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:07 gengh@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:07 sukhe@dns1004: START - running authdns-update
- 14:06 gengh@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1118 to cirrussearch1118
- 14:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:05 gengh@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1118
- 14:04 gengh@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:03 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1118
- 14:03 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1118 on all recursors
- 14:03 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1118 on all recursors
- 14:03 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:03 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1118 to cirrussearch1118 - bking@cumin2002"
- 14:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1118 to cirrussearch1118 - bking@cumin2002"
- 14:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:59 sukhe@dns1004: END - running authdns-update
- 13:59 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:58 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1118 to cirrussearch1118
- 13:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1117.eqiad.wmnet with OS bullseye
- 13:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1116.eqiad.wmnet with OS bullseye
- 13:57 sukhe@dns1004: START - running authdns-update
- 13:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1117 to cirrussearch1117
- 13:52 moritzm: installing nginx security updates
- 13:51 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1117
- 13:50 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
- 13:50 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1117
- 13:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1117 on all recursors
- 13:50 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1117 on all recursors
- 13:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:50 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1117 to cirrussearch1117 - bking@cumin2002"
- 13:50 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1117 to cirrussearch1117 - bking@cumin2002"
- 13:47 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 13:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
- 13:43 mvernon@cumin1002: END (ERROR) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=97) rolling restart_daemons on A:swift-fe-eqiad
- 13:43 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 13:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:41 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1016.eqiad.wmnet
- 13:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:37 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-fe1016.eqiad.wmnet
- 13:36 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1015.eqiad.wmnet
- 13:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1116 to cirrussearch1116
- 13:34 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1116
- 13:33 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1116
- 13:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1116 on all recursors
- 13:33 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1116 on all recursors
- 13:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:33 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1116 to cirrussearch1116 - bking@cumin2002"
- 13:33 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:32 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1116 to cirrussearch1116 - bking@cumin2002"
- 13:31 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1117 to cirrussearch1117
- 13:30 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-fe1015.eqiad.wmnet
- 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1046.eqiad.wmnet
- 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
- 13:28 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:27 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1116 to cirrussearch1116
- 13:25 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 13:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
- 13:21 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 13:21 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 13:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1046.eqiad.wmnet
- 13:15 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1045.eqiad.wmnet
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
- 13:07 moritzm: installing poppler security updates
- 13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
- 13:07 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 13:07 hashar: Restarted Apache httpd server on Gerrit server
- 13:07 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1045.eqiad.wmnet
- 12:58 Amir1: [wikishared]> CREATE INDEX translation_last_updated_timestamp ON cx_translations (translation_last_updated_timestamp); (T392839)
- 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1044.eqiad.wmnet
- 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
- 12:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
- 12:38 moritzm: installing imagemagick security updates
- 12:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1044.eqiad.wmnet
- 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
- 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
- 12:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
- 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
- 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
- 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
- 11:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
- 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
- 11:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2203.codfw.wmnet with reason: Maintenance
- 11:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
- 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
- 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
- 11:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
- 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
- 10:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
- 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
- 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
- 10:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
- 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
- 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
- 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
- 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
- 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
- 10:27 moritzm: upgrading krb2002 to Bookworm T390863
- 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
- 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
- 10:22 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on krb2002.codfw.wmnet with reason: update to Bookworm
- 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
- 10:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
- 10:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1037.eqiad.wmnet
- 10:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1037.eqiad.wmnet
- 10:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
- 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
- 09:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1036.eqiad.wmnet
- 09:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1036.eqiad.wmnet
- 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
- 09:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
- 08:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1035.eqiad.wmnet
- 08:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1035.eqiad.wmnet
- 08:54 XioNoX: update `host-inbound-traffic system-services` on pfw1-eqiad - T390052
- 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
- 08:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
- 08:09 zabe@deploy1003: Finished scap sync-world: Backport for SkinTemplate: Restore a string 'class' in tabAction() (T393504) (duration: 19m 01s)
- 08:02 zabe@deploy1003: zabe: Continuing with sync
- 07:56 zabe@deploy1003: zabe: Backport for SkinTemplate: Restore a string 'class' in tabAction() (T393504) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:50 zabe@deploy1003: Started scap sync-world: Backport for SkinTemplate: Restore a string 'class' in tabAction() (T393504)
- 07:17 slyngshede@dns1004: END - running authdns-update
- 07:14 slyngshede@dns1004: START - running authdns-update
- 06:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61588
- 06:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61588
- 06:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 24441
- 06:54 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 24441
- 06:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268097
- 06:53 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268097
- 06:53 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 35847
- 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 35847
- 06:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 264595
- 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 264595
- 06:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268517
- 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268517
- 06:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263569
- 06:51 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 263569
- 06:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
- 06:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
- 06:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
- 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
- 06:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
- 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
- 05:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
- 05:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
- 05:48 XioNoX: decom Tele2 transit in esams - T393401
- 05:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
- 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
- 05:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
- 05:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
- 04:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T382778)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250507-042334-ladsgroup.json
- 04:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P75869 and previous config saved to /var/cache/conftool/dbconfig/20250507-040826-ladsgroup.json
- 03:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P75868 and previous config saved to /var/cache/conftool/dbconfig/20250507-035319-ladsgroup.json
- 03:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T382778)', diff saved to https://phabricator.wikimedia.org/P75867 and previous config saved to /var/cache/conftool/dbconfig/20250507-033812-ladsgroup.json
- 03:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T382778)', diff saved to https://phabricator.wikimedia.org/P75866 and previous config saved to /var/cache/conftool/dbconfig/20250507-033518-ladsgroup.json
- 03:35 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T382778)', diff saved to https://phabricator.wikimedia.org/P75865 and previous config saved to /var/cache/conftool/dbconfig/20250507-033455-ladsgroup.json
- 03:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P75864 and previous config saved to /var/cache/conftool/dbconfig/20250507-031947-ladsgroup.json
- 03:07 tstarling@deploy1003: Finished scap sync-world: Backport for Hooks: disable if content model is unset AND CodeMirror beta is set (T373711) (duration: 32m 06s)
- 03:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P75863 and previous config saved to /var/cache/conftool/dbconfig/20250507-030440-ladsgroup.json
- 02:58 tstarling@deploy1003: tstarling, musikanimal: Continuing with sync
- 02:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T382778)', diff saved to https://phabricator.wikimedia.org/P75862 and previous config saved to /var/cache/conftool/dbconfig/20250507-024933-ladsgroup.json
- 02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T382778)', diff saved to https://phabricator.wikimedia.org/P75861 and previous config saved to /var/cache/conftool/dbconfig/20250507-024638-ladsgroup.json
- 02:46 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 02:45 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T382778)', diff saved to https://phabricator.wikimedia.org/P75860 and previous config saved to /var/cache/conftool/dbconfig/20250507-024518-ladsgroup.json
- 02:41 tstarling@deploy1003: tstarling, musikanimal: Backport for Hooks: disable if content model is unset AND CodeMirror beta is set (T373711) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 02:34 tstarling@deploy1003: Started scap sync-world: Backport for Hooks: disable if content model is unset AND CodeMirror beta is set (T373711)
- 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P75859 and previous config saved to /var/cache/conftool/dbconfig/20250507-023009-ladsgroup.json
- 02:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P75858 and previous config saved to /var/cache/conftool/dbconfig/20250507-021502-ladsgroup.json
- 01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T382778)', diff saved to https://phabricator.wikimedia.org/P75857 and previous config saved to /var/cache/conftool/dbconfig/20250507-015955-ladsgroup.json
- 01:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T382778)', diff saved to https://phabricator.wikimedia.org/P75856 and previous config saved to /var/cache/conftool/dbconfig/20250507-015658-ladsgroup.json
- 01:56 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 01:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T382778)', diff saved to https://phabricator.wikimedia.org/P75855 and previous config saved to /var/cache/conftool/dbconfig/20250507-015636-ladsgroup.json
- 01:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P75854 and previous config saved to /var/cache/conftool/dbconfig/20250507-014128-ladsgroup.json
- 01:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P75853 and previous config saved to /var/cache/conftool/dbconfig/20250507-012621-ladsgroup.json
- 01:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T382778)', diff saved to https://phabricator.wikimedia.org/P75852 and previous config saved to /var/cache/conftool/dbconfig/20250507-011114-ladsgroup.json
- 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T382778)', diff saved to https://phabricator.wikimedia.org/P75851 and previous config saved to /var/cache/conftool/dbconfig/20250507-010811-ladsgroup.json
- 01:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 01:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T382778)', diff saved to https://phabricator.wikimedia.org/P75850 and previous config saved to /var/cache/conftool/dbconfig/20250507-010748-ladsgroup.json
- 00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P75849 and previous config saved to /var/cache/conftool/dbconfig/20250507-005240-ladsgroup.json
- 00:39 hmonroy@deploy1003: Finished scap sync-world: Backport for Revert "JavaScript: ESLint 8.57.0" (T381577) (duration: 47m 14s)
- 00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P75848 and previous config saved to /var/cache/conftool/dbconfig/20250507-003733-ladsgroup.json
- 00:33 andrew@dns1004: END - running authdns-update
- 00:30 andrew@dns1004: START - running authdns-update
- 00:26 hmonroy@deploy1003: hmonroy, musikanimal: Continuing with sync
- 00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T382778)', diff saved to https://phabricator.wikimedia.org/P75847 and previous config saved to /var/cache/conftool/dbconfig/20250507-002226-ladsgroup.json
- 00:21 hmonroy@deploy1003: hmonroy, musikanimal: Backport for Revert "JavaScript: ESLint 8.57.0" (T381577) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 00:19 andrew@dns1004: END - running authdns-update
- 00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T382778)', diff saved to https://phabricator.wikimedia.org/P75846 and previous config saved to /var/cache/conftool/dbconfig/20250507-001924-ladsgroup.json
- 00:19 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T382778)', diff saved to https://phabricator.wikimedia.org/P75845 and previous config saved to /var/cache/conftool/dbconfig/20250507-001901-ladsgroup.json
- 00:16 andrew@dns1004: START - running authdns-update
- 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P75844 and previous config saved to /var/cache/conftool/dbconfig/20250507-000354-ladsgroup.json
2025-05-06
- 23:52 hmonroy@deploy1003: Started scap sync-world: Backport for Revert "JavaScript: ESLint 8.57.0" (T381577)
- 23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P75843 and previous config saved to /var/cache/conftool/dbconfig/20250506-234846-ladsgroup.json
- 23:37 hmonroy@deploy1003: Finished scap sync-world: Backport for InitialiseSettings: enable multiblocks on group0 (T377121) (duration: 14m 17s)
- 23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T382778)', diff saved to https://phabricator.wikimedia.org/P75842 and previous config saved to /var/cache/conftool/dbconfig/20250506-233339-ladsgroup.json
- 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T382778)', diff saved to https://phabricator.wikimedia.org/P75841 and previous config saved to /var/cache/conftool/dbconfig/20250506-233041-ladsgroup.json
- 23:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 23:30 hmonroy@deploy1003: musikanimal, hmonroy: Continuing with sync
- 23:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T382778)', diff saved to https://phabricator.wikimedia.org/P75840 and previous config saved to /var/cache/conftool/dbconfig/20250506-233002-ladsgroup.json
- 23:29 hmonroy@deploy1003: musikanimal, hmonroy: Backport for InitialiseSettings: enable multiblocks on group0 (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 23:22 hmonroy@deploy1003: Started scap sync-world: Backport for InitialiseSettings: enable multiblocks on group0 (T377121)
- 23:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1115.eqiad.wmnet with OS bullseye
- 23:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1114.eqiad.wmnet with OS bullseye
- 23:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P75839 and previous config saved to /var/cache/conftool/dbconfig/20250506-231454-ladsgroup.json
- 22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P75838 and previous config saved to /var/cache/conftool/dbconfig/20250506-225947-ladsgroup.json
- 22:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1115.eqiad.wmnet with reason: host reimage
- 22:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1114.eqiad.wmnet with reason: host reimage
- 22:45 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1115.eqiad.wmnet with reason: host reimage
- 22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T382778)', diff saved to https://phabricator.wikimedia.org/P75837 and previous config saved to /var/cache/conftool/dbconfig/20250506-224440-ladsgroup.json
- 22:44 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1114.eqiad.wmnet with reason: host reimage
- 22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T382778)', diff saved to https://phabricator.wikimedia.org/P75836 and previous config saved to /var/cache/conftool/dbconfig/20250506-224132-ladsgroup.json
- 22:41 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T382778)', diff saved to https://phabricator.wikimedia.org/P75835 and previous config saved to /var/cache/conftool/dbconfig/20250506-224110-ladsgroup.json
- 22:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1113.eqiad.wmnet with OS bullseye
- 22:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1115.eqiad.wmnet with OS bullseye
- 22:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1114.eqiad.wmnet with OS bullseye
- 22:29 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 22:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P75834 and previous config saved to /var/cache/conftool/dbconfig/20250506-222603-ladsgroup.json
- 22:25 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 22:21 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 22:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
- 22:13 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
- 22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P75833 and previous config saved to /var/cache/conftool/dbconfig/20250506-221056-ladsgroup.json
- 22:10 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 22:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 22:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1115 to cirrussearch1115
- 22:02 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1115
- 22:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1113.eqiad.wmnet with OS bullseye
- 22:02 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1115
- 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1115 on all recursors
- 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1115 on all recursors
- 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1115 to cirrussearch1115 - bking@cumin2002"
- 22:00 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 21:59 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 21:59 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T382778)', diff saved to https://phabricator.wikimedia.org/P75832 and previous config saved to /var/cache/conftool/dbconfig/20250506-215549-ladsgroup.json
- 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T382778)', diff saved to https://phabricator.wikimedia.org/P75831 and previous config saved to /var/cache/conftool/dbconfig/20250506-215242-ladsgroup.json
- 21:52 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T382778)', diff saved to https://phabricator.wikimedia.org/P75830 and previous config saved to /var/cache/conftool/dbconfig/20250506-215219-ladsgroup.json
- 21:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 21:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 21:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 21:40 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 21:40 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 21:40 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 21:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1113.eqiad.wmnet with OS bullseye
- 21:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P75829 and previous config saved to /var/cache/conftool/dbconfig/20250506-213712-ladsgroup.json
- 21:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1112.eqiad.wmnet with OS bullseye
- 21:28 ryankemper: T388134 Seeing 502 errors; that explains why the drop in requests to wdqs-full is not matched by an increase to wdqs-main. Rolling back for now while we figure out what piece we're missing
- 21:24 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1115 to cirrussearch1115 - bking@cumin2002"
- 21:23 ryankemper: T388134 Cutover of query.wikidata.org to `wdqs-main` instead of `wdqs` is ongoing. We're seeing the expected drop in queries to the main cluster (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1746565806937&to=1746566592047) but not seeing corresponding increase in wdqs-main yet
- 21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P75828 and previous config saved to /var/cache/conftool/dbconfig/20250506-212204-ladsgroup.json
- 21:20 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 21:18 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 21:18 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 21:17 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 21:17 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:17 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 21:16 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1115 to cirrussearch1115
- 21:16 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 21:16 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 21:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1114 to cirrussearch1114
- 21:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
- 21:15 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1114
- 21:12 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1114
- 21:12 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1114 on all recursors
- 21:12 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1114 on all recursors
- 21:12 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:12 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1114 to cirrussearch1114 - bking@cumin2002"
- 21:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1112.eqiad.wmnet with reason: host reimage
- 21:12 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
- 21:10 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1114 to cirrussearch1114 - bking@cumin2002"
- 21:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 21:07 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1112.eqiad.wmnet with reason: host reimage
- 21:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T382778)', diff saved to https://phabricator.wikimedia.org/P75827 and previous config saved to /var/cache/conftool/dbconfig/20250506-210658-ladsgroup.json
- 21:06 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.wikimedia.org with OS bookworm
- 21:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 21:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 21:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 21:03 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T382778)', diff saved to https://phabricator.wikimedia.org/P75826 and previous config saved to /var/cache/conftool/dbconfig/20250506-210329-ladsgroup.json
- 21:03 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 21:03 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1114 to cirrussearch1114
- 21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T382778)', diff saved to https://phabricator.wikimedia.org/P75825 and previous config saved to /var/cache/conftool/dbconfig/20250506-210307-ladsgroup.json
- 21:02 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=wdqs1011.eqiad.wmnet|wdqs1016.eqiad.wmnet|wdqs1017.eqiad.wmnet|wdqs2008.codfw.wmnet|wdqs2014.codfw.wmnet|wdqs2015.codfw.wmnet
- 21:01 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 21:00 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1113.eqiad.wmnet with OS bullseye
- 20:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1113 to cirrussearch1113
- 20:58 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1113
- 20:57 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1113
- 20:57 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1113 on all recursors
- 20:57 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1113 on all recursors
- 20:57 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:57 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1113 to cirrussearch1113 - bking@cumin2002"
- 20:56 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1113 to cirrussearch1113 - bking@cumin2002"
- 20:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1112.eqiad.wmnet with OS bullseye
- 20:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P75824 and previous config saved to /var/cache/conftool/dbconfig/20250506-204758-ladsgroup.json
- 20:45 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 20:44 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 20:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 20:43 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:42 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1113 to cirrussearch1113
- 20:40 andrew@cumin1002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cloudrabbit2001-dev.codfw.wmnet: Renew puppet certificate - andrew@cumin1002
- 20:40 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1112 to cirrussearch1112
- 20:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1112
- 20:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1112
- 20:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1112 on all recursors
- 20:36 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1112 on all recursors
- 20:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:36 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1112 to cirrussearch1112 - bking@cumin2002"
- 20:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1112 to cirrussearch1112 - bking@cumin2002"
- 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P75823 and previous config saved to /var/cache/conftool/dbconfig/20250506-203251-ladsgroup.json
- 20:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:28 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:27 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1112 to cirrussearch1112
- 20:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T382778)', diff saved to https://phabricator.wikimedia.org/P75822 and previous config saved to /var/cache/conftool/dbconfig/20250506-201744-ladsgroup.json
- 20:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T382778)', diff saved to https://phabricator.wikimedia.org/P75821 and previous config saved to /var/cache/conftool/dbconfig/20250506-201421-ladsgroup.json
- 20:14 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 20:13 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 20:13 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 20:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 20:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 20:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T382778)', diff saved to https://phabricator.wikimedia.org/P75820 and previous config saved to /var/cache/conftool/dbconfig/20250506-201145-ladsgroup.json
- 19:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P75819 and previous config saved to /var/cache/conftool/dbconfig/20250506-195638-ladsgroup.json
- 19:46 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
- 19:43 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 19:42 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 19:42 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 19:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P75818 and previous config saved to /var/cache/conftool/dbconfig/20250506-194131-ladsgroup.json
- 19:41 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 19:38 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 19:38 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 19:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T382778)', diff saved to https://phabricator.wikimedia.org/P75817 and previous config saved to /var/cache/conftool/dbconfig/20250506-192624-ladsgroup.json
- 19:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 19:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 19:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T382778)', diff saved to https://phabricator.wikimedia.org/P75816 and previous config saved to /var/cache/conftool/dbconfig/20250506-192333-ladsgroup.json
- 19:23 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1251.eqiad.wmnet with reason: Maintenance
- 19:22 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 19:21 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 19:21 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T382778)', diff saved to https://phabricator.wikimedia.org/P75815 and previous config saved to /var/cache/conftool/dbconfig/20250506-192054-ladsgroup.json
- 19:20 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 19:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 19:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P75814 and previous config saved to /var/cache/conftool/dbconfig/20250506-190547-ladsgroup.json
- 18:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P75813 and previous config saved to /var/cache/conftool/dbconfig/20250506-185040-ladsgroup.json
- 18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T382778)', diff saved to https://phabricator.wikimedia.org/P75812 and previous config saved to /var/cache/conftool/dbconfig/20250506-183533-ladsgroup.json
- 18:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T382778)', diff saved to https://phabricator.wikimedia.org/P75811 and previous config saved to /var/cache/conftool/dbconfig/20250506-183222-ladsgroup.json
- 18:32 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 18:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T382778)', diff saved to https://phabricator.wikimedia.org/P75810 and previous config saved to /var/cache/conftool/dbconfig/20250506-183159-ladsgroup.json
- 18:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:17 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.28 refs T386223
- 18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P75808 and previous config saved to /var/cache/conftool/dbconfig/20250506-181652-ladsgroup.json
- 18:13 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:12 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P75807 and previous config saved to /var/cache/conftool/dbconfig/20250506-180146-ladsgroup.json
- 17:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
- 17:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
- 17:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
- 17:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
- 17:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 17:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
- 17:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T382778)', diff saved to https://phabricator.wikimedia.org/P75806 and previous config saved to /var/cache/conftool/dbconfig/20250506-174639-ladsgroup.json
- 17:44 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 17:44 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
- 17:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
- 17:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
- 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T382778)', diff saved to https://phabricator.wikimedia.org/P75805 and previous config saved to /var/cache/conftool/dbconfig/20250506-174325-ladsgroup.json
- 17:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T382778)', diff saved to https://phabricator.wikimedia.org/P75804 and previous config saved to /var/cache/conftool/dbconfig/20250506-174313-ladsgroup.json
- 17:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
- 17:40 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
- 17:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
- 17:31 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
- 17:30 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:29 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 11s)
- 17:29 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P75803 and previous config saved to /var/cache/conftool/dbconfig/20250506-172807-ladsgroup.json
- 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P75802 and previous config saved to /var/cache/conftool/dbconfig/20250506-171259-ladsgroup.json
- 17:12 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:11 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T382778)', diff saved to https://phabricator.wikimedia.org/P75801 and previous config saved to /var/cache/conftool/dbconfig/20250506-165752-ladsgroup.json
- 16:55 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
- 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T382778)', diff saved to https://phabricator.wikimedia.org/P75800 and previous config saved to /var/cache/conftool/dbconfig/20250506-165438-ladsgroup.json
- 16:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T382778)', diff saved to https://phabricator.wikimedia.org/P75799 and previous config saved to /var/cache/conftool/dbconfig/20250506-165415-ladsgroup.json
- 16:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P75798 and previous config saved to /var/cache/conftool/dbconfig/20250506-163908-ladsgroup.json
- 16:34 denisse: enable Puppet on Grafana2001 - T384841
- 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin1002"
- 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin1002
- 16:33 cdanis@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin1002
- 16:33 cdanis@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin1002"
- 16:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P75797 and previous config saved to /var/cache/conftool/dbconfig/20250506-162401-ladsgroup.json
- 16:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T382778)', diff saved to https://phabricator.wikimedia.org/P75796 and previous config saved to /var/cache/conftool/dbconfig/20250506-160854-ladsgroup.json
- 16:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T382778)', diff saved to https://phabricator.wikimedia.org/P75795 and previous config saved to /var/cache/conftool/dbconfig/20250506-160535-ladsgroup.json
- 16:05 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 16:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T382778)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250506-160507-ladsgroup.json
- 16:04 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P75793 and previous config saved to /var/cache/conftool/dbconfig/20250506-155000-ladsgroup.json
- 15:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:48 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:48 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:45 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: Host has crashed - T393296
- 15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P75792 and previous config saved to /var/cache/conftool/dbconfig/20250506-153453-ladsgroup.json
- 15:28 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
- 15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T382778)', diff saved to https://phabricator.wikimedia.org/P75790 and previous config saved to /var/cache/conftool/dbconfig/20250506-151946-ladsgroup.json
- 15:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1111.eqiad.wmnet with OS bullseye
- 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T382778)', diff saved to https://phabricator.wikimedia.org/P75789 and previous config saved to /var/cache/conftool/dbconfig/20250506-151652-ladsgroup.json
- 15:16 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T382778)', diff saved to https://phabricator.wikimedia.org/P75788 and previous config saved to /var/cache/conftool/dbconfig/20250506-151629-ladsgroup.json
- 15:11 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:11 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1111.eqiad.wmnet with reason: host reimage
- 15:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P75787 and previous config saved to /var/cache/conftool/dbconfig/20250506-150122-ladsgroup.json
- 14:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1111.eqiad.wmnet with reason: host reimage
- 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P75786 and previous config saved to /var/cache/conftool/dbconfig/20250506-144615-ladsgroup.json
- 14:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1111.eqiad.wmnet with OS bullseye
- 14:44 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1177.eqiad.wmnet with reason: Harddrive replacement
- 14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1111 to cirrussearch1111
- 14:43 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1156.eqiad.wmnet with reason: Harddrive replacement
- 14:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1111
- 14:41 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1111
- 14:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1111 on all recursors
- 14:41 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1111 on all recursors
- 14:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:41 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
- 14:41 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
- 14:37 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:37 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating IPs for cloudrabbit200[123]-dev - andrew@cumin1002"
- 14:37 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:37 jnuche@deploy1003: Installation of scap version "4.161.0" completed for 2 hosts
- 14:36 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating IPs for cloudrabbit200[123]-dev - andrew@cumin1002"
- 14:36 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1111 to cirrussearch1111
- 14:35 jnuche@deploy1003: Installing scap version "4.161.0" for 2 host(s)
- 14:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:32 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T382778)', diff saved to https://phabricator.wikimedia.org/P75785 and previous config saved to /var/cache/conftool/dbconfig/20250506-143108-ladsgroup.json
- 14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T382778)', diff saved to https://phabricator.wikimedia.org/P75784 and previous config saved to /var/cache/conftool/dbconfig/20250506-142748-ladsgroup.json
- 14:27 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T382778)', diff saved to https://phabricator.wikimedia.org/P75783 and previous config saved to /var/cache/conftool/dbconfig/20250506-142726-ladsgroup.json
- 14:25 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1002"
- 14:25 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1002
- 14:25 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1002
- 14:25 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1002"
- 14:23 tgr_: UTC afternoon deploys done
- 14:20 tgr@deploy1003: Finished scap sync-world: Backport for logging: Add context processor (T142313) (duration: 20m 37s)
- 14:15 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on wdqs1017.eqiad.wmnet with reason: bringing host online after reimage
- 14:13 tgr@deploy1003: tgr: Continuing with sync
- 14:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P75782 and previous config saved to /var/cache/conftool/dbconfig/20250506-141220-ladsgroup.json
- 14:06 tgr@deploy1003: tgr: Backport for logging: Add context processor (T142313) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:59 tgr@deploy1003: Started scap sync-world: Backport for logging: Add context processor (T142313)
- 13:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P75781 and previous config saved to /var/cache/conftool/dbconfig/20250506-135713-ladsgroup.json
- 13:53 tgr@deploy1003: Finished scap sync-world: Backport for private: Drop $wgCentralAuthSul3SharedDomainRestrictions (T390329) (duration: 16m 32s)
- 13:44 tgr@deploy1003: tgr: Continuing with sync
- 13:43 tgr@deploy1003: tgr: Backport for private: Drop $wgCentralAuthSul3SharedDomainRestrictions (T390329) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T382778)', diff saved to https://phabricator.wikimedia.org/P75780 and previous config saved to /var/cache/conftool/dbconfig/20250506-134207-ladsgroup.json
- 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T382778)', diff saved to https://phabricator.wikimedia.org/P75779 and previous config saved to /var/cache/conftool/dbconfig/20250506-133943-ladsgroup.json
- 13:39 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T382778)', diff saved to https://phabricator.wikimedia.org/P75778 and previous config saved to /var/cache/conftool/dbconfig/20250506-133920-ladsgroup.json
- 13:36 tgr@deploy1003: Started scap sync-world: Backport for private: Drop $wgCentralAuthSul3SharedDomainRestrictions (T390329)
- 13:25 tgr@deploy1003: Finished scap sync-world: Backport for CommonSettings: Document wmfGetPrivilegedGroups usage, Revert "Add .well-known/matrix for wikimedia.org" (T223835 T261531), core-Permissions: add move-subpages to enwiki templateeditor user group (T393167), Growth-Beta: Configure higher Impact Module edit limits for pilot wikis (T341599), [
- 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P75777 and previous config saved to /var/cache/conftool/dbconfig/20250506-132413-ladsgroup.json
- 13:16 tgr@deploy1003: tgr, novemlinguae, cyndywikime, lucaswerkmeister-wmde: Continuing with sync
- {{safesubst:SAL entry|1=13:14 tgr@deploy1003: tgr, novemlinguae, cyndywikime, lucaswerkmeister-wmde: Backport for CommonSettings: Document wmfGetPrivilegedGroups usage, Revert "Add .well-known/matrix for wikimedia.org" (T223835 T261531), core-Permissions: add move-subpages to enwiki templateeditor user group (T393167), [[gerrit:1136986|Growth-Beta: Configure higher Impact Module edit limits f}}
- 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P75776 and previous config saved to /var/cache/conftool/dbconfig/20250506-130905-ladsgroup.json
- 13:08 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
- {{safesubst:SAL entry|1=13:07 tgr@deploy1003: Started scap sync-world: Backport for CommonSettings: Document wmfGetPrivilegedGroups usage, Revert "Add .well-known/matrix for wikimedia.org" (T223835 T261531), core-Permissions: add move-subpages to enwiki templateeditor user group (T393167), Growth-Beta: Configure higher Impact Module edit limits for pilot wikis (T341599), [[}}
- 13:01 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-staging-worker
- 12:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T382778)', diff saved to https://phabricator.wikimedia.org/P75775 and previous config saved to /var/cache/conftool/dbconfig/20250506-125358-ladsgroup.json
- 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T382778)', diff saved to https://phabricator.wikimedia.org/P75774 and previous config saved to /var/cache/conftool/dbconfig/20250506-125034-ladsgroup.json
- 12:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 12:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 12:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T382778)', diff saved to https://phabricator.wikimedia.org/P75773 and previous config saved to /var/cache/conftool/dbconfig/20250506-124954-ladsgroup.json
- 12:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P75772 and previous config saved to /var/cache/conftool/dbconfig/20250506-123448-ladsgroup.json
- 12:27 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
- 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P75771 and previous config saved to /var/cache/conftool/dbconfig/20250506-121940-ladsgroup.json
- 12:11 joal@deploy1003: Finished deploy [analytics/refinery@43a5f61] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@43a5f617] (duration: 01m 37s)
- 12:09 joal@deploy1003: Started deploy [analytics/refinery@43a5f61] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@43a5f617]
- 12:09 joal@deploy1003: Finished deploy [analytics/refinery@43a5f61] (thin): Regular analytics weekly train THIN [analytics/refinery@43a5f617] (duration: 01m 20s)
- 12:08 joal@deploy1003: Started deploy [analytics/refinery@43a5f61] (thin): Regular analytics weekly train THIN [analytics/refinery@43a5f617]
- 12:07 joal@deploy1003: Finished deploy [analytics/refinery@43a5f61]: Regular analytics weekly train [analytics/refinery@43a5f617] (duration: 02m 56s)
- 12:04 joal@deploy1003: Started deploy [analytics/refinery@43a5f61]: Regular analytics weekly train [analytics/refinery@43a5f617]
- 12:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T382778)', diff saved to https://phabricator.wikimedia.org/P75770 and previous config saved to /var/cache/conftool/dbconfig/20250506-120434-ladsgroup.json
- 12:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T382778)', diff saved to https://phabricator.wikimedia.org/P75769 and previous config saved to /var/cache/conftool/dbconfig/20250506-120108-ladsgroup.json
- 12:01 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T382778)', diff saved to https://phabricator.wikimedia.org/P75768 and previous config saved to /var/cache/conftool/dbconfig/20250506-120045-ladsgroup.json
- 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P75767 and previous config saved to /var/cache/conftool/dbconfig/20250506-114538-ladsgroup.json
- 11:43 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 11:42 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 11:37 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup[2010-2014].codfw.wmnet with reason: Upgrade and restart
- 11:36 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup1013.eqiad.wmnet with reason: Upgrade and restart
- 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P75766 and previous config saved to /var/cache/conftool/dbconfig/20250506-113031-ladsgroup.json
- 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T382778)', diff saved to https://phabricator.wikimedia.org/P75765 and previous config saved to /var/cache/conftool/dbconfig/20250506-111524-ladsgroup.json
- 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T382778)', diff saved to https://phabricator.wikimedia.org/P75764 and previous config saved to /var/cache/conftool/dbconfig/20250506-111157-ladsgroup.json
- 11:11 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T382778)', diff saved to https://phabricator.wikimedia.org/P75763 and previous config saved to /var/cache/conftool/dbconfig/20250506-111146-ladsgroup.json
- 10:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P75762 and previous config saved to /var/cache/conftool/dbconfig/20250506-105639-ladsgroup.json
- 10:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P75761 and previous config saved to /var/cache/conftool/dbconfig/20250506-104131-ladsgroup.json
- 10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T382778)', diff saved to https://phabricator.wikimedia.org/P75760 and previous config saved to /var/cache/conftool/dbconfig/20250506-102624-ladsgroup.json
- 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T382778)', diff saved to https://phabricator.wikimedia.org/P75759 and previous config saved to /var/cache/conftool/dbconfig/20250506-102236-ladsgroup.json
- 10:22 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T382778)', diff saved to https://phabricator.wikimedia.org/P75758 and previous config saved to /var/cache/conftool/dbconfig/20250506-102226-ladsgroup.json
- 10:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P75757 and previous config saved to /var/cache/conftool/dbconfig/20250506-100719-ladsgroup.json
- 09:57 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 09:57 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 09:56 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 09:56 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 09:56 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 09:56 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 09:55 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 09:55 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 09:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P75756 and previous config saved to /var/cache/conftool/dbconfig/20250506-095212-ladsgroup.json
- 09:44 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 09:43 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 09:42 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 09:42 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 09:41 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 09:40 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 09:40 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 09:40 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 09:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database nupwiki (T390714)
- 09:40 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database nupwiki (T390714)
- 09:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T382778)', diff saved to https://phabricator.wikimedia.org/P75755 and previous config saved to /var/cache/conftool/dbconfig/20250506-093704-ladsgroup.json
- 09:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T382778)', diff saved to https://phabricator.wikimedia.org/P75754 and previous config saved to /var/cache/conftool/dbconfig/20250506-093410-ladsgroup.json
- 09:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:28 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:28 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:27 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 09:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:27 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 09:26 lucaswerkmeister-wmde@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 09:26 lucaswerkmeister-wmde@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 07:49 elukey: restart apache2 on puppetmaster1001
- 04:07 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.24 (duration: 07m 35s)
- 04:06 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.44.0-wmf.28 refs T386223 (duration: 62m 44s)
- 03:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
- 03:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
- 03:45 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15:00:00 on wdqs[2008,2014-2015].codfw.wmnet,wdqs[1011,1016].eqiad.wmnet with reason: T388134
- 03:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
- 03:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
- 03:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
- 03:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
- 03:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
- 03:18 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
- 03:18 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 12s)
- 03:17 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 03:17 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 13s)
- 03:17 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 03:17 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 13s)
- 03:16 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 03:16 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 14s)
- 03:16 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 03:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.28 refs T386223
- 03:00 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:43 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:41 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:24 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:24 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 12s)
- 02:24 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 02:22 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:22 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 00:24 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
2025-05-05
- 23:32 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 23:29 eileen: civicrm upgraded from 5a1f3e8e to 6ffbde61
- 23:14 zabe: zabe@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php enwiki --delete /home/zabe/afl_text_table_deletedump/enwiki --sleep 0.3 # T381599
- 23:04 zabe@deploy1003: Finished scap sync-world: Backport for core-Permissions: refactor enwiki wgRemoveGroups (duration: 11m 13s)
- 23:01 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 23:01 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 23:00 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:59 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:57 zabe@deploy1003: zabe, novemlinguae: Continuing with sync
- 22:57 zabe@deploy1003: zabe, novemlinguae: Backport for core-Permissions: refactor enwiki wgRemoveGroups synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:52 zabe@deploy1003: Started scap sync-world: Backport for core-Permissions: refactor enwiki wgRemoveGroups
- 22:47 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:46 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 22:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.wikimedia.org with OS bookworm
- 22:12 sbassett: Deployed security fix (2) for T392341
- 21:57 sbassett: Deployed security fix (1) for T392341
- 21:34 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 21:15 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
- 21:14 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.wikimedia.org with OS bookworm
- 21:03 jsn@deploy1003: Finished scap sync-world: Backport for Fix link for first set of Patroller Tools surveys (T389401) (duration: 14m 43s)
- 20:59 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
- 20:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 20:56 jsn@deploy1003: jsn: Continuing with sync
- 20:56 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:56 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 20:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 20:55 jsn@deploy1003: jsn: Backport for Fix link for first set of Patroller Tools surveys (T389401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:51 vriley@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:50 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe1003
- 20:49 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe1003
- 20:48 jsn@deploy1003: Started scap sync-world: Backport for Fix link for first set of Patroller Tools surveys (T389401)
- 20:48 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:48 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt apus-fe1003 - vriley@cumin1002"
- 20:48 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt apus-fe1003 - vriley@cumin1002"
- 20:44 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on wdqs[2008,2014-2015].codfw.wmnet,wdqs[1011,1016].eqiad.wmnet with reason: T388134
- 20:41 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 20:35 jsn@deploy1003: Finished scap sync-world: Backport for Design Research Participant Survey: Undeploy (T392325), Deploy first set of Patroller Tools surveys (T389401) (duration: 19m 58s)
- 20:28 jsn@deploy1003: dani, jsn: Continuing with sync
- 20:21 jsn@deploy1003: dani, jsn: Backport for Design Research Participant Survey: Undeploy (T392325), Deploy first set of Patroller Tools surveys (T389401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:15 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1184.eqiad.wmnet with OS bullseye
- 20:15 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 20:15 jsn@deploy1003: Started scap sync-world: Backport for Design Research Participant Survey: Undeploy (T392325), Deploy first set of Patroller Tools surveys (T389401)
- 20:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 19:58 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 19:46 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1184.eqiad.wmnet with reason: host reimage
- 19:43 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1184.eqiad.wmnet with reason: host reimage
- 19:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:35 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:27 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1184.eqiad.wmnet with OS bullseye
- 18:12 aokoth@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 18:07 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:07 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 18:07 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 18:03 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:02 aokoth@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 17:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:30 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:30 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:19 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from elastic1111 to cirrussearch1111
- 17:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rolling back cirrussearch1111 to elastic1111 - bking@cumin2002"
- 17:19 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rolling back cirrussearch1111 to elastic1111 - bking@cumin2002"
- 17:16 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:16 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
- 16:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
- 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:49 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
- 16:49 bking@cumin2002: START - Cookbook sre.dns.netbox
- 16:48 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
- 16:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
- 16:46 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:45 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
- 16:44 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:39 bking@cumin2002: START - Cookbook sre.dns.netbox
- 16:38 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1111 to cirrussearch1111
- 16:30 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
- 16:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
- 16:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:20 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 16:20 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 16:09 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 16:09 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 16:07 aokoth@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 16:06 aokoth@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 16:03 aokoth@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 16:02 aokoth@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 15:46 hoo@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
- 15:46 hoo@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:46 hoo@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:45 hoo@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 15:45 hoo@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:44 hoo@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 15:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2048.codfw.wmnet with OS bookworm
- 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2047.codfw.wmnet with OS bookworm
- 15:33 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 15:33 dancy@deploy1003: Installation of scap version "4.160.0" completed for 2 hosts
- 15:32 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 15:32 hoo@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:32 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 15:32 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1031.eqiad.wmnet
- 15:32 hoo@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 15:32 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 15:31 hoo@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:31 dancy@deploy1003: Installing scap version "4.160.0" for 2 host(s)
- 15:31 hoo@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 15:30 hoo@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:29 hoo@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 15:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
- 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
- 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
- 15:25 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 15:25 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 15:23 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 15:23 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 15:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
- 15:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
- 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
- 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
- 15:12 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:12 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:12 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:11 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:11 kartik@deploy1003: Finished scap sync-world: Backport for Revert "Remove links to Special:ContentTranslationStats from dashboards" (duration: 30m 27s)
- 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
- 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
- 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
- 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
- 15:00 kartik@deploy1003: kartik: Continuing with sync
- 14:58 kartik@deploy1003: kartik: Backport for Revert "Remove links to Special:ContentTranslationStats from dashboards" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
- 14:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
- 14:44 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 14:42 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 14:40 kartik@deploy1003: Started scap sync-world: Backport for Revert "Remove links to Special:ContentTranslationStats from dashboards"
- 14:39 kartik@deploy1003: Finished scap sync-world: Backport for Growth: Remove GELevelingUpFeaturesEnabled and GEMentorDashboardEnabled feature flags (T379566) (duration: 19m 32s)
- 14:38 fabfur: upgrading haproxykafka to version 0.3.10 on A:cp (T393016)
- 14:29 kartik@deploy1003: cyndywikime, kartik: Continuing with sync
- 14:27 fabfur: enable puppet and repooled cp7001 (T393016)
- 14:27 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
- 14:25 kartik@deploy1003: cyndywikime, kartik: Backport for Growth: Remove GELevelingUpFeaturesEnabled and GEMentorDashboardEnabled feature flags (T379566) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:23 fabfur: uploading haproxykafka 0.3.10 on apt repo (T393016)
- 14:19 kartik@deploy1003: Started scap sync-world: Backport for Growth: Remove GELevelingUpFeaturesEnabled and GEMentorDashboardEnabled feature flags (T379566)
- 14:14 kartik@deploy1003: Sync cancelled.
- 14:10 kartik@deploy1003: kartik, abi: Backport for Remove links to Special:ContentTranslationStats from dashboards (T392839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:52 kartik@deploy1003: Started scap sync-world: Backport for Remove links to Special:ContentTranslationStats from dashboards (T392839)
- 13:47 kartik@deploy1003: Finished scap sync-world: Backport for Disable APIs used in Special:ContentTranslationStats (T392839) (duration: 13m 23s)
- 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
- 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
- 13:43 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:43 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:43 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitize-wiki (exit_code=1) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:43 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
- 13:43 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:42 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitize-wiki (exit_code=1) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:42 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:41 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:41 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:41 kartik@deploy1003: kartik, abi: Continuing with sync
- 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
- 13:39 kartik@deploy1003: kartik, abi: Backport for Disable APIs used in Special:ContentTranslationStats (T392839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:34 kartik@deploy1003: Started scap sync-world: Backport for Disable APIs used in Special:ContentTranslationStats (T392839)
- 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
- 13:33 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:33 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:29 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:27 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:21 kartik@deploy1003: Finished scap sync-world: Backport for Disable Special:ContentTranslationStats page (T392839 T325790) (duration: 15m 29s)
- 13:20 fabfur: disabled puppet on cp7001 to test haproxykafka version (T393016)
- 13:19 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
- 13:18 fabfur: depooling cp7001 to test new haproxykafka version (T393016)
- 13:14 kartik@deploy1003: kartik, abi: Continuing with sync
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
- 13:11 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 13:11 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 13:10 kartik@deploy1003: kartik, abi: Backport for Disable Special:ContentTranslationStats page (T392839 T325790) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:09 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:09 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 13:09 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:09 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 13:08 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:08 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 13:08 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:08 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 13:07 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
- 13:06 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 13:06 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:06 kartik@deploy1003: Started scap sync-world: Backport for Disable Special:ContentTranslationStats page (T392839 T325790)
- 13:04 tappof: rebooting centrallog1002 to rollback the kernel
- 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
- 12:59 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
- 12:56 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
- 12:52 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
- 12:47 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
- 12:47 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
- 12:43 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
- 12:42 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
- 12:39 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
- 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
- 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
- 12:28 tappof: Rolling reboot of Prometheus nodes in eqiad (1005, 1006, 1008) to rollback the kernel
- 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
- 12:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
- 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
- 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
- 12:06 aqu@deploy1003: Finished deploy [analytics/refinery@dbfa557] (thin): Deploying new refinery/source artifacts THIN [analytics/refinery@dbfa557d] (duration: 01m 07s)
- 12:04 aqu@deploy1003: Started deploy [analytics/refinery@dbfa557] (thin): Deploying new refinery/source artifacts THIN [analytics/refinery@dbfa557d]
- 12:04 aqu@deploy1003: Finished deploy [analytics/refinery@dbfa557]: Deploying new refinery/source artifacts [analytics/refinery@dbfa557d] (duration: 03m 17s)
- 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
- 12:01 aqu@deploy1003: Started deploy [analytics/refinery@dbfa557]: Deploying new refinery/source artifacts [analytics/refinery@dbfa557d]
- 12:00 aqu@deploy1003: Finished deploy [analytics/refinery@dbfa557] (hadoop-test): Deploying new refinery/source artifacts TEST [analytics/refinery@dbfa557d] (duration: 00m 53s)
- 11:59 aqu@deploy1003: Started deploy [analytics/refinery@dbfa557] (hadoop-test): Deploying new refinery/source artifacts TEST [analytics/refinery@dbfa557d]
- 11:58 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
- 11:58 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
- 11:56 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
- 11:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
- 11:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
- 11:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
- 11:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
- 11:46 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host prometheus2006.codfw.wmnet
- 11:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
- 11:45 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
- 11:44 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
- 11:38 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
- 11:34 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
- 11:12 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
- 11:05 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
- 11:05 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup[1010-1014].eqiad.wmnet with reason: Upgrade and restart
- 11:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
- 10:57 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
- 10:57 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
- 10:35 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=inference,name=codfw
- 10:32 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
- 10:32 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
- 10:24 tappof: rebooting prometheus1007 into linux-image-6.1.0-33-amd64
- 10:17 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
- 09:58 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 09:39 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 09:39 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 09:38 elukey: depool inference/codfw from DNS discovery to safely apply new pod/container security settings - T369493
- 09:30 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [plwiki] Add 'abusefilter-view-private' to sysop (T393353) (duration: 13m 04s)
- 09:23 dreamyjazz@deploy1003: dreamyjazz, msz2001: Continuing with sync
- 09:21 dreamyjazz@deploy1003: dreamyjazz, msz2001: Backport for [plwiki] Add 'abusefilter-view-private' to sysop (T393353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:17 dreamyjazz@deploy1003: Started scap sync-world: Backport for [plwiki] Add 'abusefilter-view-private' to sysop (T393353)
- 09:03 godog: powercycle vrts1003 + vrts2002 - soft lockup T393357
- 08:56 godog: powercycle centrallog2002 - can not login on ssh or console
- 08:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2015.codfw.wmnet with OS bullseye
- 08:32 tappof: rebooting prometheus2007 - no ssh, com2 via racadm hangs
- 08:32 godog: powercycle centrallog1002 - can not login on ssh or console
- 08:21 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage
- 08:17 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage
- 08:17 tappof: powercycle prometheus2008 - no ssh, mgmt console showing systemd units being deactivated, no root login
- 08:15 elukey: powercycle prometheus2005 - no ssh, mgmt console showing systemd units being deactivated, no root login
- 08:11 elukey: powercycle prometheus1008 - no ssh, mgmt console showing cpu soft lockup continously
- 08:05 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 08:05 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 08:02 tappof: rebooting prometheus1005 prometheus1006 and prometheus2006
- 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2015
- 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
- 08:00 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
- 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2015.codfw.wmnet 209.48.192.10.in-addr.arpa 9.0.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 08:00 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2015.codfw.wmnet 209.48.192.10.in-addr.arpa 9.0.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2015 - ryankemper@cumin2002"
- 08:00 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2015 - ryankemper@cumin2002"
- 07:59 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 07:59 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 07:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 07:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 07:54 Dreamy_Jazz: UTC morning backport window finished
- 07:54 dreamyjazz@deploy1003: Finished scap sync-world: Backport for nnwiki: enable wgCiteResponsiveReferences (T393299), ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803), Add checkuserwiki favicon (T393246), nupwiki: add timezone (T390711) (duration: 14m 11s)
- 07:47 dreamyjazz@deploy1003: dreamyjazz, bunnypranav, anzx: Continuing with sync
- 07:44 dreamyjazz@deploy1003: dreamyjazz, bunnypranav, anzx: Backport for nnwiki: enable wgCiteResponsiveReferences (T393299), ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803), Add checkuserwiki favicon (T393246), nupwiki: add timezone (T390711) synced to the testservers (https://wikitech.wikimedia.org
- 07:40 dreamyjazz@deploy1003: Started scap sync-world: Backport for nnwiki: enable wgCiteResponsiveReferences (T393299), ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803), Add checkuserwiki favicon (T393246), nupwiki: add timezone (T390711)
- 07:31 kartik@deploy1003: Finished scap sync-world: Backport for Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223) (duration: 17m 27s)
- 07:25 kartik@deploy1003: abi, kartik: Continuing with sync
- 07:21 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 07:21 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2015
- 07:20 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2015.codfw.wmnet with OS bullseye
- 07:19 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2014.codfw.wmnet with OS bullseye
- 07:19 kartik@deploy1003: abi, kartik: Backport for Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:14 kartik@deploy1003: Started scap sync-world: Backport for Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223)
- 07:11 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 07:11 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 07:02 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2014.codfw.wmnet with reason: host reimage
- 06:57 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2014.codfw.wmnet with reason: host reimage
- 06:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2014
- 06:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2014
- 06:37 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2014
- 06:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2014.codfw.wmnet 192.16.192.10.in-addr.arpa 2.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 06:37 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2014.codfw.wmnet 192.16.192.10.in-addr.arpa 2.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 06:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 06:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2014 - ryankemper@cumin2002"
- 06:37 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2014 - ryankemper@cumin2002"
- 06:30 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 06:27 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2014
- 06:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2014.codfw.wmnet with OS bullseye
- 06:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
- 06:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
- 06:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
- 06:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
- 05:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2008.codfw.wmnet with OS bullseye
- 05:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
- 05:25 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
- 05:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2008
- 05:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2008
- 05:06 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2008
- 05:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2008.codfw.wmnet 194.32.192.10.in-addr.arpa 4.9.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 05:06 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2008.codfw.wmnet 194.32.192.10.in-addr.arpa 4.9.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 05:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 05:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2008 - ryankemper@cumin2002"
- 05:05 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2008 - ryankemper@cumin2002"
- 05:04 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 05:00 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 04:58 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2008
- 04:58 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2008.codfw.wmnet with OS bullseye
- 04:38 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 04:34 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 04:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 04:19 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 04:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 04:13 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 04:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 04:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 04:05 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 03:54 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 03:54 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 03:53 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cloudcontrol2009-dev to cloudrabbit2003-dev
- 03:52 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit2003-dev
- 03:52 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit2003-dev
- 03:52 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 03:50 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cloudcontrol2008-dev to cloudrabbit2002-dev
- 03:49 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 03:49 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit2002-dev
- 03:49 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit2002-dev
- 03:49 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 03:49 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2008-dev to cloudrabbit2002-dev - andrew@cumin1002"
- 03:48 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2008-dev to cloudrabbit2002-dev - andrew@cumin1002"
- 03:46 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 03:44 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 03:43 andrew@cumin1002: START - Cookbook sre.hosts.rename from cloudcontrol2009-dev to cloudrabbit2003-dev
- 03:43 andrew@cumin1002: START - Cookbook sre.hosts.rename from cloudcontrol2008-dev to cloudrabbit2002-dev
- 03:43 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 03:43 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cloudcontrol2007-dev to cloudrabbit2001-dev
- 03:42 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit2001-dev
- 03:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
- 03:42 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit2001-dev
- 03:42 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 03:42 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2007-dev to cloudrabbit2001-dev - andrew@cumin1002"
- 03:41 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2007-dev to cloudrabbit2001-dev - andrew@cumin1002"
- 03:37 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 03:36 andrew@cumin1002: START - Cookbook sre.hosts.rename from cloudcontrol2007-dev to cloudrabbit2001-dev
- 03:26 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
- 03:24 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
- 02:59 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
- 01:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1011.eqiad.wmnet with OS bullseye
- 01:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: host reimage
- 01:36 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: host reimage
- 01:19 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1011.eqiad.wmnet with OS bullseye
2025-05-04
- 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1003.eqiad.wmnet
- 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 23:27 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 23:22 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 23:16 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1003.eqiad.wmnet
- 23:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1002.eqiad.wmnet
- 23:15 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:15 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 23:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 23:08 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 23:02 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1002.eqiad.wmnet
- 23:02 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1001.eqiad.wmnet
- 23:02 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:02 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 23:01 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 22:57 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 22:52 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1001.eqiad.wmnet
- 20:29 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 20:29 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 20:07 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1056*,elastic1063* for host appears to have hot shards - bking@cumin2002
- 20:06 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1056*,elastic1063* for host appears to have hot shards - bking@cumin2002
- 19:43 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1063* for host appears to have hot shards - bking@cumin2002
- 19:43 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1063* for host appears to have hot shards - bking@cumin2002
- 19:35 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1062* for hosts appear to have hot shards - bking@cumin2002
- 19:35 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1062* for hosts appear to have hot shards - bking@cumin2002
- 19:10 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1057*,elastic1058* for hosts appear to have hot shards - bking@cumin2002
- 19:10 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1057*,elastic1058* for hosts appear to have hot shards - bking@cumin2002
- 19:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1057* for host appears to have hot shards - bking@cumin2002
- 19:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1057* for host appears to have hot shards - bking@cumin2002
- 19:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1064* for host appears to have hot shards - bking@cumin2002
- 19:03 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1064* for host appears to have hot shards - bking@cumin2002
- 10:36 krinkle@deploy1003: Finished scap sync-world: Backport for actions: Fix handling of redirects to known (non-existing) pages (duration: 30m 22s)
- 10:26 krinkle@deploy1003: krinkle: Continuing with sync
- 10:22 krinkle@deploy1003: krinkle: Backport for actions: Fix handling of redirects to known (non-existing) pages synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:06 krinkle@deploy1003: Started scap sync-world: Backport for actions: Fix handling of redirects to known (non-existing) pages
2025-05-03
- 20:09 taavi@cumin1002: dbctl commit (dc=all): 'depool db1246', diff saved to https://phabricator.wikimedia.org/P75739 and previous config saved to /var/cache/conftool/dbconfig/20250503-200910-taavi.json
- 18:35 hnowlan: delete a stuck thumbor pod in codfw
- 13:53 krinkle@deploy1003: Finished scap sync-world: Backport for multiversion: Remove getMWConfigForCacheing() as identical to getConfigGlobals() (T169821), tests: Move buildLogoHTML.php to tests/ alongside buildConfigCache.php, multiversion: Separate wmf-config reading from actual Multiversion (T169821) (duration: 16m 22s)
- 13:46 krinkle@deploy1003: krinkle: Continuing with sync
- 13:41 krinkle@deploy1003: krinkle: Backport for multiversion: Remove getMWConfigForCacheing() as identical to getConfigGlobals() (T169821), tests: Move buildLogoHTML.php to tests/ alongside buildConfigCache.php, multiversion: Separate wmf-config reading from actual Multiversion (T169821) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:36 krinkle@deploy1003: Started scap sync-world: Backport for multiversion: Remove getMWConfigForCacheing() as identical to getConfigGlobals() (T169821), tests: Move buildLogoHTML.php to tests/ alongside buildConfigCache.php, multiversion: Separate wmf-config reading from actual Multiversion (T169821)
- 12:19 reedy@deploy1003: Synchronized wmf-config/InitialiseSettings-labs.php: Allow all users to use 2FA on beta (duration: 11m 14s)
2025-05-02
- 21:38 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1054.eqiad.wmnet with OS bookworm
- 21:23 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
- 20:34 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
- 20:31 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:29 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:27 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
- 20:23 tzatziki: removed 3 files for legal compliance
- 20:18 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1054.eqiad.wmnet with OS bookworm
- 20:16 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:15 tzatziki: removed 1 file for legal compliance
- 20:11 tzatziki: removed 1 file for legal compliance
- 20:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:09 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:57 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:41 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
- 19:38 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 17:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1168.eqiad.wmnet
- 17:27 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1168.eqiad.wmnet
- 17:26 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1167.eqiad.wmnet
- 17:19 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1167.eqiad.wmnet
- 17:17 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1166.eqiad.wmnet
- 17:09 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1166.eqiad.wmnet
- 16:53 sukhe@dns1004: END - running authdns-update
- 16:51 sukhe@dns1004: START - running authdns-update
- 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-f1-codfw.mgmt.codfw.wmnet
- 16:28 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 18:00:00 on ms-fe1016.eqiad.wmnet with reason: not yet in prod
- 16:28 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 18:00:00 on ms-fe1015.eqiad.wmnet with reason: not yet in prod
- 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 16:24 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-f1-codfw.mgmt.codfw.wmnet
- 15:45 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1166.eqiad.wmnet
- 15:11 herron: power cycling prometheus200[78] via rac
- 15:06 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1168.eqiad.wmnet
- 15:05 jgleeson: SmashPig changed from 9b3c4587 to ddf64519
- 15:04 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1168.eqiad.wmnet
- 15:03 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1167.eqiad.wmnet
- 15:01 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
- 15:01 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2076.codfw.wmnet|cirrussearch2080.codfw.wmnet|cirrussearch2081.codfw.wmnet|cirrussearch2083.codfw.wmnet|cirrussearch2084.codfw.wmnet|cirrussearch2092.codfw.wmnet|cirrussearch2093.codfw.wmnet|cirrussearch2100.codfw.wmnet|cirrussearch2106.codfw.wmnet|cirrussearch2108.codfw.wmnet
- 15:01 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1166.eqiad.wmnet
- 14:55 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1166.eqiad.wmnet
- 14:48 dancy@deploy1003: Installation of scap version "4.159.0" completed for 2 hosts
- 14:46 dancy@deploy1003: Installing scap version "4.159.0" for 2 host(s)
- 14:11 inflatador: bking@localhost set search_codfw num_concurrent_incoming_recoveries from 20 back down to 4 after migration T391350
- 13:49 moritzm: imported ruby-defaults 1:3.3~wmf13u1 to component/puppet7 for trixie-wikimedia T392790
- 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2008.wikimedia.org
- 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2008.wikimedia.org
- 13:25 urandom: invoked manual `garbagecollect`, Cassandra sessionstore — T390514
- 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2007.codfw.wmnet
- 13:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2007.codfw.wmnet
- 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2006.codfw.wmnet
- 12:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2006.codfw.wmnet
- 10:06 moritzm: imported ruby-concurrent 1.1.6+dfsg-5~wmf13u1 to component/puppet7 for trixie-wikimedia T392790
- 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet
- 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet
- 09:54 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1167.eqiad.wmnet
- 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
- 09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
- 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 09:31 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
- 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
- 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
- 08:29 XioNoX: update codfw pfw NAT - T392843
- 08:16 jmm@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
- 08:13 XioNoX: push pfw policies - T393098
- 08:09 jmm@cumin1002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
- 06:46 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1167.eqiad.wmnet
- 06:42 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
- 06:30 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MarkTraceur out of all services on: 2404 hosts
- 06:21 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1167.eqiad.wmnet
- 06:18 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
- 06:14 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1166.eqiad.wmnet
- 06:09 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1166.eqiad.wmnet
- 00:41 dwisehaupt: starting staging db refresh on frdb1006 with civicrm/drupal/fredge restores from 20250430
2025-05-01
- 22:27 thcipriani: mwscript-k8s -- resetAuthenticationThrottle.pp --wiki=aawiki --signup --ip=<istanbul ips> (x17)
- 22:09 dzahn@deploy1003: Finished scap sync-world: Backport for Add another throttle rule for Istanbul Hackathon 2025 (T382309) (duration: 14m 32s)
- 22:02 dzahn@deploy1003: dzahn: Continuing with sync
- 22:00 dzahn@deploy1003: dzahn: Backport for Add another throttle rule for Istanbul Hackathon 2025 (T382309) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:54 dzahn@deploy1003: Started scap sync-world: Backport for Add another throttle rule for Istanbul Hackathon 2025 (T382309)
- 21:40 dzahn@deploy1003: Finished scap sync-world: Backport for Add throttle rule for Istanbul Hackathon 2025 (T382309) (duration: 25m 16s)
- 21:34 dzahn@deploy1003: dzahn: Continuing with sync
- 21:20 dzahn@deploy1003: dzahn: Backport for Add throttle rule for Istanbul Hackathon 2025 (T382309) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:15 dzahn@deploy1003: Started scap sync-world: Backport for Add throttle rule for Istanbul Hackathon 2025 (T382309)
- 21:03 ryankemper: T376151 [wdqs-internal lvs teardown] Declaring this officially done. No more irc log spam from me today :)
- 21:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove VIPs for wdqs-internal - ryankemper@cumin2002"
- 21:01 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove VIPs for wdqs-internal - ryankemper@cumin2002"
- 21:01 ryankemper: T376151 [wdqs-internal lvs teardown] `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/codfw/wdqs-internal/wdqs` && `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/codfw/wdqs-internal/`
- 21:01 ryankemper: T376151 [wdqs-internal lvs teardown] `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/eqiad/wdqs-internal/wdqs` && `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/eqiad/wdqs-internal/`
- 20:54 ryankemper: T376151 [wdqs-internal lvs teardown] `sudo rm -fv /srv/config-master/pybal/eqiad/wdqs-internal && sudo rm -fv /srv/config-master/pybal/codfw/wdqs-internal` on `config-master[1,2]001`
- 20:53 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 20:50 ryankemper: T376151 [wdqs-internal lvs teardown] Surrendered `10.2.2.41/32` (eqiad wdqs-internal vip) and `10.2.1.41/32` (codfw wdqs-internal vip) from netbox interface
- 20:48 ryankemper@dns1004: END - running authdns-update
- 20:46 ryankemper@dns1004: START - running authdns-update
- 20:45 jhuneidi@deploy1003: Finished scap sync-world: Backport for Check for content validity before extracting license (T389125), Fix localization for validation errors checking tabular data (T389126) (duration: 30m 35s)
- 20:40 sukhe: restart pybal on lvs1020
- 20:35 jhuneidi@deploy1003: bvibber, jhuneidi: Continuing with sync
- 20:33 jhuneidi@deploy1003: bvibber, jhuneidi: Backport for Check for content validity before extracting license (T389125), Fix localization for validation errors checking tabular data (T389126) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:32 sukhe: sudo cumin 'O:config_master' 'run-puppet-agent'
- 20:14 jhuneidi@deploy1003: Started scap sync-world: Backport for Check for content validity before extracting license (T389125), Fix localization for validation errors checking tabular data (T389126)
- 19:37 sukhe: no pending Netbox changes
- 19:37 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:34 sukhe: [correction] running sre.dns.netbox to ensure no pending changes (NOT in dry-run)
- 19:34 sukhe: running sre.dns.netbox to ensure no pending changes
- 19:34 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 19:33 dduvall: re-ran scap sync to fix mw-jobrunner codfw deployments following failed helmfile apply and verified correct image ref manually (T386222)
- 19:30 dduvall@deploy1003: Finished scap sync-world: retrying sync-world following spurious helmfile apply error (mw-jobrunner codfw) (duration: 11m 24s)
- 19:20 sukhe: sukhe@netbox1003:~$ sudo systemctl start uwsgi-netbox.service: service was OOM'ed, restarting
- 19:18 dduvall@deploy1003: Started scap sync-world: retrying sync-world following spurious helmfile apply error (mw-jobrunner codfw)
- 19:16 jhathaway@dns1004: END - running authdns-update
- 19:14 jhathaway@dns1004: START - running authdns-update
- 19:09 ryankemper: T376151 [wdqs-internal lvs teardown] running puppet across `A:wdqs-internal` now that pybal has been restarted
- 19:09 dduvall: deployment of mw-jobrunner-main for codfw failed during scap train (group2) (T386222)
- 19:09 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] all IPVS diff check alerts have recovered, rolling restart complete
- 19:06 dduvall: helm error during group2 deployment "Get "https://kubemaster.svc.codfw.wmnet:6443/api/v1/namespaces/mw-jobrunner/services/mediawiki-main-tls-service": dial tcp 10.2.1.8:6443: connect: no route to host - error from a previous attempt: read tcp 10.64.16.93:41894->10.2.1.8:6443: read: connection reset by peer"
- 19:04 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] `ipvsadm --delete-service --tcp-service 10.2.2.41:80` on `lvs1019` and `lvs1020`
- 19:03 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] `ipvsadm --delete-service --tcp-service 10.2.1.41:80` on `A:lvs-secondary-codfw OR A:lvs-low-traffic-codfw`(lvs2013, lvs2014)
- 18:59 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-low-traffic-codfw` (lvs2013)
- 18:58 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-secondary-codfw` (lvs2014), waiting 2 mins before proceeding
- 18:55 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-low-traffic-eqiad` (lvs1019), waiting few mins before proceeding
- 18:48 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-secondary-eqiad`, it only restarted on ` lvs1020` but for some reason ` lvs1013` doesn't have a pybal service running
- 18:44 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] ran puppet on `O:Lvs::balancer` after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136747
- 18:32 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
- 18:31 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply
- 18:30 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
- 18:29 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
- 18:28 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 18:27 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
- 18:26 ryankemper: T376151 (wdqs-internal lvs teardown) Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136744 to flip `wdqs-internal` service state to `lvs_setup` and running puppet across `A:dnsbox`
- 18:24 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.27 refs T386222
- 18:23 ryankemper@dns1004: END - running authdns-update
- 18:21 ryankemper@dns1004: START - running authdns-update
- 17:31 jhathaway: testing sasl email relaying on mx-in{1001,2001}
- 16:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 16:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 16:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 16:38 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 16:04 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2045.codfw.wmnet with OS bookworm
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2045.codfw.wmnet with reason: host reimage
- 15:40 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2045.codfw.wmnet with reason: host reimage
- 15:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
- 15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
- 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
- 15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 13:51 TheresNoTime: ran `[samtar@deploy1003 ~]$ mwscript-k8s --comment="T393093" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=knwikiquote --logwiki=metawiki '~aanzx' 'A826'` for T393093
- 13:49 samtar@deploy1003: Finished scap sync-world: Backport for mswikisource: add NamespacesToBeSearchedDefault (T392984) (duration: 12m 44s)
- 13:42 samtar@deploy1003: anzx, samtar: Continuing with sync
- 13:41 samtar@deploy1003: anzx, samtar: Backport for mswikisource: add NamespacesToBeSearchedDefault (T392984) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:39 urandom: invoking garbagecollect on sessionstore cluster — T390514
- 13:36 samtar@deploy1003: Started scap sync-world: Backport for mswikisource: add NamespacesToBeSearchedDefault (T392984)
- 13:34 urandom: lowering sessionstore gc_grace_seconds to 172800 (two days) — T390514
- 13:31 samtar@deploy1003: Finished scap sync-world: Backport for [arwiki] Change logo and tagline with sync wordmark (T392858) (duration: 21m 53s)
- 13:24 samtar@deploy1003: gergesshamon, samtar: Continuing with sync
- 13:17 samtar@deploy1003: gergesshamon, samtar: Backport for [arwiki] Change logo and tagline with sync wordmark (T392858) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:09 samtar@deploy1003: Started scap sync-world: Backport for [arwiki] Change logo and tagline with sync wordmark (T392858)
- 12:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 11:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 11:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 11:12 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 11:12 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 09:46 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:45 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 05:29 eileen: civicrm upgraded from 6c99f0c9 to 5a1f3e8e
- 05:14 eileen: config revision changed from b200409c to ddf64519
- 01:32 tstarling@deploy1003: Finished scap sync-world: Backport for testwiki: enable wgUseCodexSpecialBlock and wgEnableMultiBlocks (T377121) (duration: 13m 52s)
- 01:25 tstarling@deploy1003: tstarling, musikanimal: Continuing with sync
- 01:25 tstarling@deploy1003: tstarling, musikanimal: Backport for testwiki: enable wgUseCodexSpecialBlock and wgEnableMultiBlocks (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 01:18 tstarling@deploy1003: Started scap sync-world: Backport for testwiki: enable wgUseCodexSpecialBlock and wgEnableMultiBlocks (T377121)
Other archives
2000s
- Archive 1: 2004 Jun - 2004 Sep
- Archive 2: 2004 Oct - 2004 Nov
- Archive 3: 2004 Dec - 2005 Mar
- Archive 4: 2005 Apr - 2005 Jul
- Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
- Archive 6: 2005 Nov - 2006 Feb
- Archive 7: 2006 Mar - 2006 Jun
- Archive 8: 2006 Jul - 2006 Sep
- Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
- Archive 10: 2007 Feb - 2007 Jun
- Archive 11: 2007 Jul - 2007 Dec
- Archive 12: 2008 Jan - 2008 Jul
- Archive 12a: 2008 Aug
- Archive 12b: 2008 Sept
- Archive 13: 2008 Oct - 2009 Jun
- Archive 14: 2009 Jun - 2009 Dec
2010s
- Archive 15: 2010 Jan - 2010 Jun
- Archive 16: 2010 Jul - 2010 Oct
- Archive 17: 2010 Nov - 2010 Dec
- Archive 18: 2011 Jan - 2011 Jun
- Archive 19: 2011 Jul - 2011 Dec
- Archive 20: 2011 Dec - 2012 Jun, with revision history 2007-02-21 to 2012-03-27
- Archive 21: 2012 Jul - 2013 Jan
- Archive 22: 2013 Jan - 2013 Jul
- Archive 23: 2013 Aug - 2013 Dec
- Archive 24: 2014 Jan - 2014 Mar
- Archive 25: 2014 April - 2014 September
- Archive 26: 2014 October - 2014 December
- Archive 27: 2015 January - 2015 July
- Archive 28: 2015 August - 2015 December
- Archive 29: 2016 January - 2016 May
- Archive 30: 2016 June - 2016 August
- Archive 31: 2016 September - 2016 December
- Archive 32: 2017 January - 2017 July
- Archive 33: 2017 August - 2017 December
- Archive 34: 2018 January - 2018 April
- Archive 35: 2018 May - 2018 August
- Archive 36: 2018 September - 2018 December
- Archive 37: 2019 January - 2019 April
- Archive 38: 2019 May - 2019 August
- Archive 39: 2019 September - 2019 December
2020-2024
- Archive 40: 2020 January - 2020 April
- Archive 41: 2020 May - 2020 July
- Archive 42: 2020 August - 2020 November
- Archive 43: 2020 December
- Archive 44: 2021 January - 2021 April
- Archive 45: 2021 May - 2021 July
- Archive 46: 2021 August - 2021 October
- Archive 47: 2021 November - 2021 December
- Archive 48: 2022 January
- Archive 49: 2022 February
- Archive 50: 2022 March
- Archive 51: 2022 April 1-15
- Archive 52: 2022 April 16-30
- Archive 53: 2022 May
- Archive 54: 2022 June
- Archive 55: 2022 July
- Archive 56: 2022 August
- Archive 57: 2022 September
- Archive 58: 2022 October
- Archive 59: 2022 November 1-15
- Archive 60: 2022 November 16-30
- Archive 61: 2022 December
- Archive 62: 2023 January
- Archive 63: 2023 February
- Archive 64: 2023 March
- Archive 65: 2023 April
- Archive 66: 2023 May
- Archive 67: 2023 June
- Archive 68: 2023 July
- Archive 69: 2023 August 1-15
- Archive 70: 2023 August 16-31
- Archive 71: 2023 September
- Archive 72: 2023 October
- Archive 73: 2023 November
- Archive 74: 2023 December
- Archive 75: 2024 January
- Archive 76: 2024 February
- Archive 77: 2024 March
- Archive 78: 2024 April
- Archive 79: 2024 May 1-15
- Archive 80: 2024 May 16-31
- Archive 81: 2024 June 1-15
- Archive 82: 2024 June 16-30
- Archive 83: 2024 July
- Archive 84: 2024 August
- Archive 85: 2024 September
- Archive 86: 2024 October
- Archive 87: 2024 November
- Archive 88: 2024 December