Server Admin Log/Archive 103
Appearance
2026-03-31
- 23:51 krinkle@deploy1003: Finished scap sync-world: Backport for Enable $wgTrackMediaRequestProvenance on group0 wikis (T414338) (duration: 40m 21s)
- 23:43 krinkle@deploy1003: krinkle: Continuing with sync
- 23:12 krinkle@deploy1003: krinkle: Backport for Enable $wgTrackMediaRequestProvenance on group0 wikis (T414338) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:10 krinkle@deploy1003: Started scap sync-world: Backport for Enable $wgTrackMediaRequestProvenance on group0 wikis (T414338)
- 22:58 urbanecm@deploy1003: Finished scap sync-world: T420154 (duration: 34m 31s)
- 22:24 urbanecm@deploy1003: Started scap sync-world: T420154
- 22:13 jdlrobson@deploy1003: Finished scap sync-world: Backport for Enable parser survey for all wikis (T414852) (duration: 11m 25s)
- 22:06 jdlrobson@deploy1003: jdlrobson: Continuing with sync
- 22:04 jdlrobson@deploy1003: jdlrobson: Backport for Enable parser survey for all wikis (T414852) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:02 jdlrobson@deploy1003: Started scap sync-world: Backport for Enable parser survey for all wikis (T414852)
- 21:56 urbanecm@deploy1003: Finished scap sync-world: Email confirmation banner: Add Test Kitchen A/B gating (T421366) (duration: 31m 33s)
- 21:25 urbanecm@deploy1003: Started scap sync-world: Email confirmation banner: Add Test Kitchen A/B gating (T421366)
- 21:05 dancy@deploy1003: Installation of scap version "4.243.0" completed for 2 hosts
- 21:03 dancy@deploy1003: Installing scap version "4.243.0" for 2 host(s)
- 20:47 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 20:46 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 20:33 urbanecm@deploy1003: Finished scap sync-world: Backport for Add instrumentation for email confirmation lifecycle events (T420007) (duration: 12m 46s)
- 20:29 urbanecm@deploy1003: urbanecm, catrope: Continuing with sync
- 20:23 urbanecm@deploy1003: urbanecm, catrope: Backport for Add instrumentation for email confirmation lifecycle events (T420007) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:21 urbanecm@deploy1003: Started scap sync-world: Backport for Add instrumentation for email confirmation lifecycle events (T420007)
- 20:15 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1114.*
- 20:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1114.eqiad.wmnet with OS trixie
- 20:12 aaron@deploy1003: Finished scap sync-world: Backport for Move all analytics API sandbox entries to testwiki (T419429) (duration: 07m 05s)
- 20:08 aaron@deploy1003: aaron: Continuing with sync
- 20:07 aaron@deploy1003: aaron: Backport for Move all analytics API sandbox entries to testwiki (T419429) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:05 aaron@deploy1003: Started scap sync-world: Backport for Move all analytics API sandbox entries to testwiki (T419429)
- 20:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs6001.drmrs.wmnet} and A:liberica
- 19:59 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6001.drmrs.wmnet} and A:liberica
- 19:59 cdobbins@cumin2002: END (ERROR) - Cookbook sre.loadbalancer.admin (exit_code=97) rebooting P{lvs6003.drmrs.wmnet} and A:liberica
- 19:58 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6003.drmrs.wmnet} and A:liberica
- 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
- 19:46 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs6002.drmrs.wmnet} and A:liberica
- 19:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1114.eqiad.wmnet with reason: host reimage
- 19:43 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6002.drmrs.wmnet} and A:liberica
- 19:38 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs6002.drmrs.wmnet} and A:liberica
- 19:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha[1001-1002,2001-2002].wikimedia.org
- 19:38 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:38 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha[1001-1002,2001-2002].wikimedia.org decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
- 19:38 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin pooling P{lvs6002.drmrs.wmnet} and A:liberica
- 19:37 cdobbins@cumin2002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) rebooting P{lvs6002.drmrs.wmnet} and A:liberica
- 19:37 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6002.drmrs.wmnet} and A:liberica
- 19:36 cdobbins@cumin2002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) rebooting P{lvs6002.drmrs.wmnet} and A:liberica
- 19:36 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6002.drmrs.wmnet} and A:liberica
- 19:35 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha[1001-1002,2001-2002].wikimedia.org decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
- 19:30 brett@cumin2002: START - Cookbook sre.dns.netbox
- 19:30 dancy@deploy1003: Finished scap sync-world: (no justification provided) (duration: 06m 59s)
- 19:27 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1114.eqiad.wmnet with OS trixie
- 19:26 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1112.*
- 19:26 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1112.eqiad.wmnet with OS trixie
- 19:26 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs6002.drmrs.wmnet} and A:liberica
- 19:25 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin depooling P{lvs6002.drmrs.wmnet} and A:liberica
- 19:23 dancy@deploy1003: Started scap sync-world: (no justification provided)
- 19:21 brett@cumin2002: START - Cookbook sre.hosts.decommission for hosts hcaptcha[1001-1002,2001-2002].wikimedia.org
- 19:20 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 19:19 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 19:17 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
- 19:16 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
- 19:15 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 19:15 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 19:13 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 19:13 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 19:12 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 19:12 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 19:11 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 19:11 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 19:07 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs (T411097)
- 19:05 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs (T411097)
- 19:05 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T411097)
- 19:04 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T411097)
- 19:04 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs (T411097)
- 19:03 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs (T411097)
- 19:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
- 19:03 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs (T411097)
- 19:02 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs (T411097)
- 18:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1112.eqiad.wmnet with reason: host reimage
- 18:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS trixie
- 18:35 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS trixie
- 18:34 brett: sudo -i cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.12:4260' - T411097
- 18:33 brett: sudo -i cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsadm --delete-service --tcp-service 10.2.1.12:4260' - T411097
- 18:30 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs (T411097)
- 18:29 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs (T411097)
- 18:28 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T411097)
- 18:28 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T411097)
- 18:26 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs (T411097)
- 18:26 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs (T411097)
- 18:24 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs (T411097)
- 18:24 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs (T411097)
- 18:19 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[1031,2034]*,aqs[1010,2001]*: Actually upgrade Cassandra to 4.1.11 — T418417 - eevans@cumin1003
- 18:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2003.codfw.wmnet
- 18:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2003.codfw.wmnet
- 18:07 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS trixie
- 18:07 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1112.eqiad.wmnet with OS trixie
- 18:00 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 216302
- 17:59 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 216302
- 17:53 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1112.eqiad.wmnet with OS trixie
- 17:52 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1110.*
- 17:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1110.eqiad.wmnet with OS trixie
- 17:46 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[1031,2034]*,aqs[1010,2001]*: Actually upgrade Cassandra to 4.1.11 — T418417 - eevans@cumin1003
- 17:46 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet
- 17:39 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1004.eqiad.wmnet
- 17:39 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on hcaptcha[1001-1002,2001-2002].wikimedia.org with reason: Decommissioning
- 17:37 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 17:37 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 17:36 brett@dns1006: END - running authdns-update
- 17:35 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[1031,2034]*: Upgrade Cassandra to 4.1.11 — T418417 - eevans@cumin1003
- 17:34 brett@dns1006: START - running authdns-update
- 17:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
- 17:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1110.eqiad.wmnet with reason: host reimage
- 17:18 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[1031,2034]*: Upgrade Cassandra to 4.1.11 — T418417 - eevans@cumin1003
- 17:16 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1003.eqiad.wmnet
- 17:13 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 17:12 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 17:10 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1003.eqiad.wmnet
- 17:10 joal@deploy1003: Finished deploy [analytics/refinery@8d91f24] (thin): Regular analytics weekly train THIN [analytics/refinery@8d91f242] (duration: 01m 56s)
- 17:08 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS trixie
- 17:08 joal@deploy1003: Started deploy [analytics/refinery@8d91f24] (thin): Regular analytics weekly train THIN [analytics/refinery@8d91f242]
- 17:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1110.eqiad.wmnet with OS trixie
- 17:08 joal@deploy1003: Finished deploy [analytics/refinery@8d91f24]: Regular analytics weekly train [analytics/refinery@8d91f242] (duration: 07m 47s)
- 17:07 otto@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:07 otto@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:07 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:07 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:07 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:06 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:00 joal@deploy1003: Started deploy [analytics/refinery@8d91f24]: Regular analytics weekly train [analytics/refinery@8d91f242]
- 16:59 joal@deploy1003: Finished deploy [analytics/refinery@8d91f24] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8d91f242] (duration: 01m 53s)
- 16:58 joal@deploy1003: Started deploy [analytics/refinery@8d91f24] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@8d91f242]
- 16:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1110.eqiad.wmnet with OS trixie
- 16:55 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs2001.codfw.wmnet: Upgrade Cassandra to 4.1.11 — T418417 - eevans@cumin1003
- 16:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 16:54 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 16:51 moritzm: installing Bind security updates (client-side tools and libs)
- 16:48 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs2001.codfw.wmnet: Upgrade Cassandra to 4.1.11 — T418417 - eevans@cumin1003
- 16:45 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Upgrade Cassandra to 4.1.11 — T418417 - eevans@cumin1003
- 16:45 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1365.eqiad.wmnet with OS trixie
- 16:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T419635)', diff saved to https://phabricator.wikimedia.org/P90094 and previous config saved to /var/cache/conftool/dbconfig/20260331-164004-fceratto.json
- 16:37 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Upgrade Cassandra to 4.1.11 — T418417 - eevans@cumin1003
- 16:31 eevans@cumin1003: END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on A:restbase
- 16:29 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P90092 and previous config saved to /var/cache/conftool/dbconfig/20260331-162956-fceratto.json
- 16:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1365.eqiad.wmnet with reason: host reimage
- 16:28 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1114.eqiad.wmnet [reason: trixie reimaging]
- 16:22 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1365.eqiad.wmnet with reason: host reimage
- 16:22 akhatun@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 16:22 akhatun@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 16:19 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P90091 and previous config saved to /var/cache/conftool/dbconfig/20260331-161947-fceratto.json
- 16:15 eevans@cumin1003: START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase
- 16:14 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
- 16:12 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 16:12 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 16:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 16:10 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 16:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1365
- 16:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1365
- 16:10 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1365
- 16:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1365.eqiad.wmnet 206.48.64.10.in-addr.arpa 6.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:10 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1365.eqiad.wmnet 206.48.64.10.in-addr.arpa 6.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1365 - ayounsi@cumin1003"
- 16:09 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T419635)', diff saved to https://phabricator.wikimedia.org/P90090 and previous config saved to /var/cache/conftool/dbconfig/20260331-160939-fceratto.json
- 16:08 javiermonton@deploy1003: Finished scap sync-world: Backport for stream: mediawiki.page_edit_type_simple.dev1 (T421005) (duration: 08m 40s)
- 16:07 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:06 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:04 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
- 16:04 javiermonton@deploy1003: akhatun, javiermonton: Continuing with sync
- 16:02 javiermonton@deploy1003: akhatun, javiermonton: Backport for stream: mediawiki.page_edit_type_simple.dev1 (T421005) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:59 javiermonton@deploy1003: Started scap sync-world: Backport for stream: mediawiki.page_edit_type_simple.dev1 (T421005)
- 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2238 (T419635)', diff saved to https://phabricator.wikimedia.org/P90089 and previous config saved to /var/cache/conftool/dbconfig/20260331-155538-fceratto.json
- 15:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2238.codfw.wmnet with reason: Maintenance
- 15:55 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T419635)', diff saved to https://phabricator.wikimedia.org/P90088 and previous config saved to /var/cache/conftool/dbconfig/20260331-155512-fceratto.json
- 15:45 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1365 - ayounsi@cumin1003"
- 15:45 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P90087 and previous config saved to /var/cache/conftool/dbconfig/20260331-154504-fceratto.json
- 15:34 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P90086 and previous config saved to /var/cache/conftool/dbconfig/20260331-153456-fceratto.json
- 15:25 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 15:25 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1365
- 15:24 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T419635)', diff saved to https://phabricator.wikimedia.org/P90084 and previous config saved to /var/cache/conftool/dbconfig/20260331-152448-fceratto.json
- 15:24 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1365.eqiad.wmnet with OS trixie
- 15:23 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2226 (T419635)', diff saved to https://phabricator.wikimedia.org/P90083 and previous config saved to /var/cache/conftool/dbconfig/20260331-152323-fceratto.json
- 15:23 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2226.codfw.wmnet with reason: Maintenance
- 15:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T419635)', diff saved to https://phabricator.wikimedia.org/P90082 and previous config saved to /var/cache/conftool/dbconfig/20260331-152258-fceratto.json
- 15:22 ladsgroup@deploy1003: Finished scap sync-world: Backport for Enable links db split on testcommonswiki (T421914) (duration: 09m 15s)
- 15:17 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 15:14 ladsgroup@deploy1003: ladsgroup: Backport for Enable links db split on testcommonswiki (T421914) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P90080 and previous config saved to /var/cache/conftool/dbconfig/20260331-151250-fceratto.json
- 15:12 ladsgroup@deploy1003: Started scap sync-world: Backport for Enable links db split on testcommonswiki (T421914)
- 15:11 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet with OS bookworm
- 15:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1364.eqiad.wmnet with OS trixie
- 15:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P90079 and previous config saved to /var/cache/conftool/dbconfig/20260331-150241-fceratto.json
- 15:00 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.22 refs T420480
- 14:53 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
- 14:53 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T419635)', diff saved to https://phabricator.wikimedia.org/P90078 and previous config saved to /var/cache/conftool/dbconfig/20260331-145233-fceratto.json
- 14:51 Amir1: creating links tables on x1 for testcommonswiki (T421914)
- 14:51 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:50 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1364.eqiad.wmnet with reason: host reimage
- 14:48 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1004.eqiad.wmnet with reason: host reimage
- 14:48 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
- 14:46 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:46 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1364.eqiad.wmnet with reason: host reimage
- 14:42 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:41 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs - 3.2 upgrade (T421402)
- 14:38 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
- 14:38 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2225 (T419635)', diff saved to https://phabricator.wikimedia.org/P90076 and previous config saved to /var/cache/conftool/dbconfig/20260331-143816-fceratto.json
- 14:38 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2225.codfw.wmnet with reason: Maintenance
- 14:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T419635)', diff saved to https://phabricator.wikimedia.org/P90075 and previous config saved to /var/cache/conftool/dbconfig/20260331-143751-fceratto.json
- 14:36 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:36 cjming@deploy1003: Finished scap sync-world: Backport for ConfigsFetcher: Increasing the cache version (T421828) (duration: 08m 12s)
- 14:33 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1364
- 14:33 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1364
- 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host dse-k8s-worker1004
- 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1004
- 14:32 cjming@deploy1003: cjming: Continuing with sync
- 14:31 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1364
- 14:31 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1364.eqiad.wmnet 147.32.64.10.in-addr.arpa 7.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:31 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1364.eqiad.wmnet 147.32.64.10.in-addr.arpa 7.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:31 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:31 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1364 - ayounsi@cumin1003"
- 14:30 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1364 - ayounsi@cumin1003"
- 14:30 cjming@deploy1003: cjming: Backport for ConfigsFetcher: Increasing the cache version (T421828) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:28 cjming@deploy1003: Started scap sync-world: Backport for ConfigsFetcher: Increasing the cache version (T421828)
- 14:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P90074 and previous config saved to /var/cache/conftool/dbconfig/20260331-142743-fceratto.json
- 14:23 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 14:22 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1364
- 14:22 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1364.eqiad.wmnet with OS trixie
- 14:20 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1363.eqiad.wmnet with OS trixie
- 14:17 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P90072 and previous config saved to /var/cache/conftool/dbconfig/20260331-141735-fceratto.json
- 14:16 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1004
- 14:16 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1004.eqiad.wmnet 52.48.64.10.in-addr.arpa 2.5.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:16 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1004.eqiad.wmnet 52.48.64.10.in-addr.arpa 2.5.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:16 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:16 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host dse-k8s-worker1004 - btullis@cumin1003"
- 14:16 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host dse-k8s-worker1004 - btullis@cumin1003"
- 14:13 ladsgroup@deploy1003: Finished scap sync-world: Backport for maintenance: Introduce reconcileTables (T410145 T408137) (duration: 12m 33s)
- 14:12 ebysans@deploy1003: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
- 14:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T419635)', diff saved to https://phabricator.wikimedia.org/P90071 and previous config saved to /var/cache/conftool/dbconfig/20260331-140727-fceratto.json
- 14:06 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 14:06 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2204 (T419635)', diff saved to https://phabricator.wikimedia.org/P90070 and previous config saved to /var/cache/conftool/dbconfig/20260331-140602-fceratto.json
- 14:05 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
- 14:05 ladsgroup@deploy1003: ladsgroup: Backport for maintenance: Introduce reconcileTables (T410145 T408137) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:04 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1363.eqiad.wmnet with reason: host reimage
- 14:02 ebysans@deploy1003: helmfile [staging] START helmfile.d/services/media-analytics: apply
- 14:00 ladsgroup@deploy1003: Started scap sync-world: Backport for maintenance: Introduce reconcileTables (T410145 T408137)
- 13:59 btullis@cumin1003: START - Cookbook sre.dns.netbox
- 13:58 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1363.eqiad.wmnet with reason: host reimage
- 13:56 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host dse-k8s-worker1004
- 13:55 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1004.eqiad.wmnet with OS bookworm
- 13:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 13:51 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs - 3.2 upgrade (T421402)
- 13:51 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T419635)', diff saved to https://phabricator.wikimedia.org/P90068 and previous config saved to /var/cache/conftool/dbconfig/20260331-135145-fceratto.json
- 13:47 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-build1001.eqiad.wmnet
- 13:46 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1363
- 13:46 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1363
- 13:46 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1363
- 13:46 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1363.eqiad.wmnet 135.32.64.10.in-addr.arpa 5.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 13:46 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1363.eqiad.wmnet 135.32.64.10.in-addr.arpa 5.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 13:46 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:45 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1363 - ayounsi@cumin1003"
- 13:45 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1363 - ayounsi@cumin1003"
- 13:44 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs - 3.2 upgrade (T421402)
- {{safesubst:SAL entry|1=13:43 derick@deploy1003: Finished scap sync-world: Backport for Set a JWT cookie for OAuth 1 and OAuth 2 owner-only requests (T417833), tests: OAuth1 and OAuth2 owner-only JWT support (T417833 T415281), tests: Add test for asserting JWT cookie not set for OAuth2 consumers (T417833 T415281), [[gerrit:1260006|Enable JWTs for OAuth1 consumers and OAuth2 owner-only consum}}
- 13:42 XioNoX: push pfw policies - T421895
- 13:41 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P90067 and previous config saved to /var/cache/conftool/dbconfig/20260331-134137-fceratto.json
- 13:41 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-build1001.eqiad.wmnet
- 13:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 13:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P90066 and previous config saved to /var/cache/conftool/dbconfig/20260331-133129-fceratto.json
- 13:31 derick@deploy1003: derick, d3r1ck01: Continuing with sync
- {{safesubst:SAL entry|1=13:28 derick@deploy1003: derick, d3r1ck01: Backport for Set a JWT cookie for OAuth 1 and OAuth 2 owner-only requests (T417833), tests: OAuth1 and OAuth2 owner-only JWT support (T417833 T415281), tests: Add test for asserting JWT cookie not set for OAuth2 consumers (T417833 T415281), [[gerrit:1260006|Enable JWTs for OAuth1 consumers and OAuth2 owner-only consumers (T41}}
- 13:23 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1363
- 13:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1363.eqiad.wmnet with OS trixie
- 13:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T419635)', diff saved to https://phabricator.wikimedia.org/P90065 and previous config saved to /var/cache/conftool/dbconfig/20260331-132121-fceratto.json
- 13:10 cdanis: 💙cdanis@cumin1003.eqiad.wmnet ~ 🕘☕ sudo cumin A:cp-eqiad 'apt install lua5.4-ciderbloom-dbgsym'
- {{safesubst:SAL entry|1=13:10 derick@deploy1003: Started scap sync-world: Backport for Set a JWT cookie for OAuth 1 and OAuth 2 owner-only requests (T417833), tests: OAuth1 and OAuth2 owner-only JWT support (T417833 T415281), tests: Add test for asserting JWT cookie not set for OAuth2 consumers (T417833 T415281), [[gerrit:1260006|Enable JWTs for OAuth1 consumers and OAuth2 owner-only consume}}
- 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2189 (T419635)', diff saved to https://phabricator.wikimedia.org/P90064 and previous config saved to /var/cache/conftool/dbconfig/20260331-130731-fceratto.json
- 13:07 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 13:07 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T419635)', diff saved to https://phabricator.wikimedia.org/P90063 and previous config saved to /var/cache/conftool/dbconfig/20260331-130717-fceratto.json
- 12:57 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P90058 and previous config saved to /var/cache/conftool/dbconfig/20260331-125709-fceratto.json
- 12:51 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs - 3.2 upgrade (T421402)
- 12:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P90055 and previous config saved to /var/cache/conftool/dbconfig/20260331-124701-fceratto.json
- 12:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T419635)', diff saved to https://phabricator.wikimedia.org/P90054 and previous config saved to /var/cache/conftool/dbconfig/20260331-123653-fceratto.json
- 12:34 fabfur: upgrading drmrs to haproxy 3.2 (T421402)
- 12:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2175 (T419635)', diff saved to https://phabricator.wikimedia.org/P90050 and previous config saved to /var/cache/conftool/dbconfig/20260331-122106-fceratto.json
- 12:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 12:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T419635)', diff saved to https://phabricator.wikimedia.org/P90049 and previous config saved to /var/cache/conftool/dbconfig/20260331-122041-fceratto.json
- 12:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1023.eqiad.wmnet with reason: Downgrade to 10.11.13
- 12:12 ladsgroup@deploy1003: Finished scap sync-world: Backport for Remove VP8 from transcoding (T413031) (duration: 10m 31s)
- 12:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P90048 and previous config saved to /var/cache/conftool/dbconfig/20260331-121032-fceratto.json
- 12:07 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 12:03 ladsgroup@deploy1003: ladsgroup: Backport for Remove VP8 from transcoding (T413031) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:01 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove VP8 from transcoding (T413031)
- 12:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P90046 and previous config saved to /var/cache/conftool/dbconfig/20260331-120024-fceratto.json
- 11:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1003.eqiad.wmnet with OS bookworm
- 11:57 ladsgroup@deploy1003: Finished scap sync-world: Backport for Switch from InterwikiSortingPrepend to the ULS config (duration: 13m 19s)
- 11:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T419635)', diff saved to https://phabricator.wikimedia.org/P90044 and previous config saved to /var/cache/conftool/dbconfig/20260331-115016-fceratto.json
- 11:50 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 11:49 ladsgroup@deploy1003: ladsgroup: Backport for Switch from InterwikiSortingPrepend to the ULS config synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:44 ladsgroup@deploy1003: Started scap sync-world: Backport for Switch from InterwikiSortingPrepend to the ULS config
- 11:38 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
- 11:37 moritzm: upgrade Envoy on IDM to 1.35.9 T419637
- 11:34 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2148 (T419635)', diff saved to https://phabricator.wikimedia.org/P90042 and previous config saved to /var/cache/conftool/dbconfig/20260331-113407-fceratto.json
- 11:34 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 11:32 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1003.eqiad.wmnet with reason: host reimage
- 11:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1362.eqiad.wmnet with OS trixie
- 11:22 moritzm: installing gnupg2 security updates
- 11:16 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host dse-k8s-worker1003
- 11:16 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1003
- 11:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:13 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:12 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:10 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1003
- 11:10 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1003.eqiad.wmnet 178.32.64.10.in-addr.arpa 8.7.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:10 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1003.eqiad.wmnet 178.32.64.10.in-addr.arpa 8.7.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:10 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:10 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host dse-k8s-worker1003 - btullis@cumin1003"
- 11:10 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host dse-k8s-worker1003 - btullis@cumin1003"
- 11:08 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1362.eqiad.wmnet with reason: host reimage
- 11:04 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
- 11:01 btullis@cumin1003: START - Cookbook sre.dns.netbox
- 11:01 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1362.eqiad.wmnet with reason: host reimage
- 10:58 btullis@cumin1003: START - Cookbook sre.hosts.move-vlan for host dse-k8s-worker1003
- 10:58 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1003.eqiad.wmnet with OS bookworm
- 10:56 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
- 10:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1362
- 10:49 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1362
- 10:48 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1362
- 10:48 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1362.eqiad.wmnet 134.32.64.10.in-addr.arpa 4.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 10:48 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1362.eqiad.wmnet 134.32.64.10.in-addr.arpa 4.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 10:48 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:48 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1362 - ayounsi@cumin1003"
- 10:48 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1362 - ayounsi@cumin1003"
- 10:48 marostegui: rename table global_block_whitelist on s3 and s5 for closed.dblist wikis T420525
- 10:44 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:44 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:38 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 10:38 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1362
- 10:38 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1362.eqiad.wmnet with OS trixie
- 10:36 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1334.eqiad.wmnet with OS trixie
- 10:21 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1159 (T419635)', diff saved to https://phabricator.wikimedia.org/P90036 and previous config saved to /var/cache/conftool/dbconfig/20260331-102112-fceratto.json
- 10:21 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1159.eqiad.wmnet with reason: Maintenance
- 10:19 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1334.eqiad.wmnet with reason: host reimage
- 10:15 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1334.eqiad.wmnet with reason: host reimage
- 10:03 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1334
- 10:03 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1334
- 10:03 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1334
- 10:03 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1334.eqiad.wmnet 192.48.64.10.in-addr.arpa 2.9.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 10:03 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1334.eqiad.wmnet 192.48.64.10.in-addr.arpa 2.9.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 10:03 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:02 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1334 - ayounsi@cumin1003"
- 10:02 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1334 - ayounsi@cumin1003"
- 09:59 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 09:59 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 09:58 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1334
- 09:58 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1334.eqiad.wmnet with OS trixie
- 09:58 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{wikikube-worker1347.eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 09:58 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1347.eqiad.wmnet
- 09:58 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1347.eqiad.wmnet
- 09:52 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1347.eqiad.wmnet
- 09:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 09:52 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1347.eqiad.wmnet
- 09:52 jayme@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker1347.eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 09:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T419635)', diff saved to https://phabricator.wikimedia.org/P90035 and previous config saved to /var/cache/conftool/dbconfig/20260331-095213-fceratto.json
- 09:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1333.eqiad.wmnet with OS trixie
- 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
- 09:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
- 09:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P90034 and previous config saved to /var/cache/conftool/dbconfig/20260331-094205-fceratto.json
- 09:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1333.eqiad.wmnet with reason: host reimage
- 09:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P90033 and previous config saved to /var/cache/conftool/dbconfig/20260331-093156-fceratto.json
- 09:27 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1333.eqiad.wmnet with reason: host reimage
- 09:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T419635)', diff saved to https://phabricator.wikimedia.org/P90032 and previous config saved to /var/cache/conftool/dbconfig/20260331-092148-fceratto.json
- 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2001.codfw.wmnet
- 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1201 (T419635)', diff saved to https://phabricator.wikimedia.org/P90031 and previous config saved to /var/cache/conftool/dbconfig/20260331-092038-fceratto.json
- 09:20 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 09:20 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T419635)', diff saved to https://phabricator.wikimedia.org/P90030 and previous config saved to /var/cache/conftool/dbconfig/20260331-092014-fceratto.json
- 09:15 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1333
- 09:15 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1333
- 09:14 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1333
- 09:14 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1333.eqiad.wmnet 191.48.64.10.in-addr.arpa 1.9.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 09:14 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1333.eqiad.wmnet 191.48.64.10.in-addr.arpa 1.9.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 09:14 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:14 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1333 - ayounsi@cumin1003"
- 09:14 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1333 - ayounsi@cumin1003"
- 09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2001.codfw.wmnet
- 09:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P90029 and previous config saved to /var/cache/conftool/dbconfig/20260331-091007-fceratto.json
- 09:07 XioNoX: pfw1-eqiad - add NAT rule - T421750
- 09:04 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "remove body from patterns - oblivian@cumin1003"
- 09:04 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: remove body from patterns - oblivian@cumin1003
- 09:03 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: remove body from patterns - oblivian@cumin1003
- 09:03 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "remove body from patterns - oblivian@cumin1003"
- 08:59 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P90027 and previous config saved to /var/cache/conftool/dbconfig/20260331-085958-fceratto.json
- 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2014.codfw.wmnet
- 08:57 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 08:57 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1333
- 08:56 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1333.eqiad.wmnet with OS trixie
- 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2014.codfw.wmnet
- 08:49 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T419635)', diff saved to https://phabricator.wikimedia.org/P90026 and previous config saved to /var/cache/conftool/dbconfig/20260331-084951-fceratto.json
- 08:47 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1187 (T419635)', diff saved to https://phabricator.wikimedia.org/P90025 and previous config saved to /var/cache/conftool/dbconfig/20260331-084742-fceratto.json
- 08:47 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 08:47 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T419635)', diff saved to https://phabricator.wikimedia.org/P90024 and previous config saved to /var/cache/conftool/dbconfig/20260331-084714-fceratto.json
- 08:45 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1332.eqiad.wmnet with OS trixie
- 08:44 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.check-ipip (exit_code=0)
- 08:44 jayme@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 08:37 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P90023 and previous config saved to /var/cache/conftool/dbconfig/20260331-083707-fceratto.json
- 08:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1332.eqiad.wmnet with reason: host reimage
- 08:27 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P90022 and previous config saved to /var/cache/conftool/dbconfig/20260331-082700-fceratto.json
- 08:24 elukey: upgrade spicerack on cumin1003
- 08:23 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1332.eqiad.wmnet with reason: host reimage
- 08:19 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-tool1011.eqiad.wmnet
- 08:19 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:17 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
- 08:16 brouberol@cumin1003: START - Cookbook sre.dns.netbox
- 08:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T419635)', diff saved to https://phabricator.wikimedia.org/P90021 and previous config saved to /var/cache/conftool/dbconfig/20260331-081651-fceratto.json
- 08:16 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-tool1007.eqiad.wmnet
- 08:16 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:16 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-tool1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
- 08:16 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-tool1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
- 08:15 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1180 (T419635)', diff saved to https://phabricator.wikimedia.org/P90020 and previous config saved to /var/cache/conftool/dbconfig/20260331-081541-fceratto.json
- 08:15 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 08:15 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T419635)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260331-081511-fceratto.json
- 08:12 brouberol@cumin1003: START - Cookbook sre.dns.netbox
- 08:11 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1332
- 08:11 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1332
- 08:10 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
- 08:07 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1332
- 08:07 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1332.eqiad.wmnet 190.48.64.10.in-addr.arpa 0.9.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 08:07 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1332.eqiad.wmnet 190.48.64.10.in-addr.arpa 0.9.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 08:07 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:07 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1332 - ayounsi@cumin1003"
- 08:07 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1332 - ayounsi@cumin1003"
- 08:07 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-tool1011.eqiad.wmnet
- 08:06 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-tool1007.eqiad.wmnet
- 08:05 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260331-080459-fceratto.json
- 08:03 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2013.codfw.wmnet
- 08:02 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1332
- 08:01 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1332.eqiad.wmnet with OS trixie
- 07:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2013.codfw.wmnet
- 07:55 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1331.eqiad.wmnet with OS trixie
- 07:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P90019 and previous config saved to /var/cache/conftool/dbconfig/20260331-075450-fceratto.json
- 07:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2012.codfw.wmnet
- 07:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2012.codfw.wmnet
- 07:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T419635)', diff saved to https://phabricator.wikimedia.org/P90018 and previous config saved to /var/cache/conftool/dbconfig/20260331-074442-fceratto.json
- 07:42 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1168 (T419635)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260331-074227-fceratto.json
- 07:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 07:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T419635)', diff saved to https://phabricator.wikimedia.org/P90016 and previous config saved to /var/cache/conftool/dbconfig/20260331-074202-fceratto.json
- 07:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1331.eqiad.wmnet with reason: host reimage
- 07:37 fabfur@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "New deploy for MR 152 - fabfur@cumin1003"
- 07:37 fabfur@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: New deploy for MR 152 - fabfur@cumin1003
- 07:36 jayme@cumin1003: END (FAIL) - Cookbook sre.loadbalancer.check-ipip (exit_code=99)
- 07:36 jayme@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 07:36 fabfur@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: New deploy for MR 152 - fabfur@cumin1003
- 07:36 fabfur@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "New deploy for MR 152 - fabfur@cumin1003"
- 07:36 fabfur@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "New deploy for MR 152 - fabfur@cumin1003"
- 07:36 fabfur@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: New deploy for MR 152 - fabfur@cumin1003
- 07:35 fabfur@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: New deploy for MR 152 - fabfur@cumin1003
- 07:35 fabfur@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "New deploy for MR 152 - fabfur@cumin1003"
- 07:35 jayme@cumin1003: END (FAIL) - Cookbook sre.loadbalancer.check-ipip (exit_code=99)
- 07:35 jayme@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 07:34 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1331.eqiad.wmnet with reason: host reimage
- 07:31 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P90015 and previous config saved to /var/cache/conftool/dbconfig/20260331-073155-fceratto.json
- 07:23 ryankemper: T410577 ^ cookbook did its job, ctrl+c'd after one host was rebooted. new spicerack upgrade confirmed working
- 07:23 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1331
- 07:23 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1331
- 07:22 ryankemper@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster search_codfw: test reboot boottime check T410577 - ryankemper@cumin2002 - T410577
- 07:21 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P90014 and previous config saved to /var/cache/conftool/dbconfig/20260331-072147-fceratto.json
- 07:21 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1331
- 07:21 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1331.eqiad.wmnet 172.48.64.10.in-addr.arpa 2.7.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 07:21 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1331.eqiad.wmnet 172.48.64.10.in-addr.arpa 2.7.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 07:21 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:21 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1331 - ayounsi@cumin1003"
- 07:21 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1331 - ayounsi@cumin1003"
- 07:16 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 07:16 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1331
- 07:16 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1331.eqiad.wmnet with OS trixie
- 07:14 moritzm: installing mongo-c-driver security updates
- 07:11 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T419635)', diff saved to https://phabricator.wikimedia.org/P90013 and previous config saved to /var/cache/conftool/dbconfig/20260331-071140-fceratto.json
- 07:06 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1002.eqiad.wmnet
- 07:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps2011.codfw.wmnet
- 06:58 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host titan1002.eqiad.wmnet
- 06:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps2011.codfw.wmnet
- 06:55 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster search_codfw: test reboot boottime check T410577 - ryankemper@cumin2002 - T410577
- 06:55 moritzm: installing postgresql-15 security updates
- 06:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 12 hosts with reason: Downgrade to 10.11.13
- 06:50 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db1165 (T419635)', diff saved to https://phabricator.wikimedia.org/P90012 and previous config saved to /var/cache/conftool/dbconfig/20260331-065027-fceratto.json
- 06:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 06:50 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 06:46 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1001.eqiad.wmnet
- 06:38 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
- 06:38 tappof@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host titan1001.eqiad.wmnet
- 06:38 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
- 06:37 tappof@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host titan1001.eqiad.wmnet
- 06:37 tappof@cumin1003: START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
- 06:36 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2157 (T419635)', diff saved to https://phabricator.wikimedia.org/P90011 and previous config saved to /var/cache/conftool/dbconfig/20260331-063611-fceratto.json
- 06:36 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 05:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3007.wikimedia.org
- 05:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3007.wikimedia.org
- 05:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6003.wikimedia.org
- 05:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6003.wikimedia.org
- 04:02 mwpresync@deploy1003: Pruned MediaWiki: 1.46.0-wmf.19 (duration: 02m 29s)
- 03:41 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.46.0-wmf.22 refs T420480 (duration: 37m 41s)
- 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.46.0-wmf.22 refs T420480
- 02:45 swfrench@deploy1003: Finished scap sync-world: Backport for Revert "Enable $wgTempCategoryCollations for s3 wikis." (T419274) (duration: 07m 27s)
- 02:41 swfrench@deploy1003: swfrench: Continuing with sync
- 02:40 swfrench@deploy1003: swfrench: Backport for Revert "Enable $wgTempCategoryCollations for s3 wikis." (T419274) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 02:38 swfrench@deploy1003: Started scap sync-world: Backport for Revert "Enable $wgTempCategoryCollations for s3 wikis." (T419274)
- 02:08 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 07m 13s)
- 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
- 00:48 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 00:38 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 00:36 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 00:26 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 00:25 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 00:15 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
2026-03-30
- 23:19 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1027.eqiad.wmnet with OS bullseye
- 23:04 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1027.eqiad.wmnet with reason: host reimage
- 23:00 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1105.*
- 22:59 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1027.eqiad.wmnet with reason: host reimage
- 22:55 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1114.eqiad.wmnet [reason: trixie reimaging]
- 22:55 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1112.eqiad.wmnet [reason: trixie reimaging]
- 22:54 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1112.eqiad.wmnet [reason: trixie reimaging]
- 22:54 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1110.eqiad.wmnet [reason: trixie reimaging]
- 22:49 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1110.eqiad.wmnet [reason: trixie reimaging]
- 22:48 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1108.eqiad.wmnet [reason: trixie reimaging]
- 22:48 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aqs1027
- 22:48 eevans@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aqs1027
- 22:47 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1108.eqiad.wmnet with OS trixie
- 22:47 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs6002.drmrs.wmnet
- 22:47 cdobbins@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs6002.drmrs.wmnet
- 22:33 cdobbins@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs6002.drmrs.wmnet with reason: planned reboot
- 22:25 eevans@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host aqs1027
- 22:25 eevans@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aqs1027.eqiad.wmnet 26.32.64.10.in-addr.arpa 6.2.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 22:25 eevans@cumin1003: START - Cookbook sre.dns.wipe-cache aqs1027.eqiad.wmnet 26.32.64.10.in-addr.arpa 6.2.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 22:25 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:25 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host aqs1027 - eevans@cumin1003"
- 22:24 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host aqs1027 - eevans@cumin1003"
- 22:24 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
- 22:19 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1108.eqiad.wmnet with reason: host reimage
- 22:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 22:02 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS trixie
- 22:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kafka-logging2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2008
- 21:57 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1108.eqiad.wmnet with OS trixie
- 21:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2008
- 21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2007
- 21:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2007
- 21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kafka-logging2006
- 21:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kafka-logging2006
- 21:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1105.eqiad.wmnet with OS trixie
- 21:35 eevans@cumin1003: START - Cookbook sre.dns.netbox
- 21:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
- 21:19 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1105.eqiad.wmnet with reason: host reimage
- 21:16 eevans@cumin1003: START - Cookbook sre.hosts.move-vlan for host aqs1027
- 21:15 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1027.eqiad.wmnet with OS bullseye
- 21:14 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1026.eqiad.wmnet with OS bullseye
- 21:07 cjming: end of UTC late backport window
- 21:06 cjming@deploy1003: Finished scap sync-world: Backport for Add event stream for logged-in reader retention experiment (T420490) (duration: 07m 04s)
- 21:02 cjming@deploy1003: cjming, annet: Continuing with sync
- 21:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS trixie
- 21:02 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1105.eqiad.wmnet with OS trixie
- 21:02 cjming@deploy1003: cjming, annet: Backport for Add event stream for logged-in reader retention experiment (T420490) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:00 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1026.eqiad.wmnet with reason: host reimage
- 20:59 cjming@deploy1003: Started scap sync-world: Backport for Add event stream for logged-in reader retention experiment (T420490)
- 20:53 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1026.eqiad.wmnet with reason: host reimage
- 20:51 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1105.eqiad.wmnet with OS trixie
- 20:49 cjming@deploy1003: Finished scap sync-world: Backport for Add delete-redirect to filemovers on Wikimedia Commons (T421373), Add TestKitchenExposureResetEpoch config variable (T414738) (duration: 06m 49s)
- 20:48 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T419635)', diff saved to https://phabricator.wikimedia.org/P90008 and previous config saved to /var/cache/conftool/dbconfig/20260330-204856-fceratto.json
- 20:45 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1107.*
- 20:45 cjming@deploy1003: cjming, nmw03: Continuing with sync
- 20:44 cjming@deploy1003: cjming, nmw03: Backport for Add delete-redirect to filemovers on Wikimedia Commons (T421373), Add TestKitchenExposureResetEpoch config variable (T414738) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1107.eqiad.wmnet with OS trixie
- 20:42 cjming@deploy1003: Started scap sync-world: Backport for Add delete-redirect to filemovers on Wikimedia Commons (T421373), Add TestKitchenExposureResetEpoch config variable (T414738)
- 20:41 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aqs1026
- 20:41 eevans@cumin1003: START - Cookbook sre.hosts.move-vlan for host aqs1026
- 20:41 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1026.eqiad.wmnet with OS bullseye
- 20:40 cjming@deploy1003: Finished scap sync-world: Backport for [EventStreamConfig] Add product_metrics.web_base.active_reader_baseline stream and product_metrics.web_base.attribution_research stream (T420621) (duration: 08m 15s)
- 20:38 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P90007 and previous config saved to /var/cache/conftool/dbconfig/20260330-203848-fceratto.json
- 20:36 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1108.eqiad.wmnet with OS trixie
- 20:35 cjming@deploy1003: tchin, cjming: Continuing with sync
- 20:35 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1108.eqiad.wmnet [reason: trixie reimaging]
- 20:35 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1106.eqiad.wmnet [reason: trixie reimaging]
- 20:34 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1106.eqiad.wmnet with OS trixie
- 20:33 cjming@deploy1003: tchin, cjming: Backport for [EventStreamConfig] Add product_metrics.web_base.active_reader_baseline stream and product_metrics.web_base.attribution_research stream (T420621) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:31 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1025.eqiad.wmnet with OS bullseye
- 20:31 cjming@deploy1003: Started scap sync-world: Backport for [EventStreamConfig] Add product_metrics.web_base.active_reader_baseline stream and product_metrics.web_base.attribution_research stream (T420621)
- 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kafka-logging2006 to codfw - jhancock@cumin2002"
- 20:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kafka-logging2006 to codfw - jhancock@cumin2002"
- 20:29 cjming@deploy1003: Finished scap sync-world: Backport for config: Enable EmailConfirmationBanner on testwiki (T421366) (duration: 19m 16s)
- 20:28 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P90006 and previous config saved to /var/cache/conftool/dbconfig/20260330-202840-fceratto.json
- 20:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:25 cjming@deploy1003: cjming, mmartorana: Continuing with sync
- 20:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
- 20:19 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev200[2-3].codfw.wmnet: Applying upgrade to Cassandra 4.1.11 — T418417 - eevans@cumin1003
- 20:18 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T419635)', diff saved to https://phabricator.wikimedia.org/P90005 and previous config saved to /var/cache/conftool/dbconfig/20260330-201831-fceratto.json
- 20:18 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1025.eqiad.wmnet with reason: host reimage
- 20:17 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1107.eqiad.wmnet with reason: host reimage
- 20:14 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1025.eqiad.wmnet with reason: host reimage
- 20:12 cjming@deploy1003: cjming, mmartorana: Backport for config: Enable EmailConfirmationBanner on testwiki (T421366) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:11 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2224 (T419635)', diff saved to https://phabricator.wikimedia.org/P90004 and previous config saved to /var/cache/conftool/dbconfig/20260330-201116-fceratto.json
- 20:11 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 20:10 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T419635)', diff saved to https://phabricator.wikimedia.org/P90003 and previous config saved to /var/cache/conftool/dbconfig/20260330-201050-fceratto.json
- 20:10 cjming@deploy1003: Started scap sync-world: Backport for config: Enable EmailConfirmationBanner on testwiki (T421366)
- 20:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev200[2-3].codfw.wmnet: Applying upgrade to Cassandra 4.1.11 — T418417 - eevans@cumin1003
- 20:06 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
- 20:02 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aqs1025
- 20:02 eevans@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aqs1025
- 20:02 eevans@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host aqs1025
- 20:02 eevans@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aqs1025.eqiad.wmnet 24.32.64.10.in-addr.arpa 4.2.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 20:02 eevans@cumin1003: START - Cookbook sre.dns.wipe-cache aqs1025.eqiad.wmnet 24.32.64.10.in-addr.arpa 4.2.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 20:02 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:02 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host aqs1025 - eevans@cumin1003"
- 20:02 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host aqs1025 - eevans@cumin1003"
- 20:01 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1106.eqiad.wmnet with reason: host reimage
- 20:00 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P90002 and previous config saved to /var/cache/conftool/dbconfig/20260330-200042-fceratto.json
- 20:00 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS trixie
- 20:00 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1107.eqiad.wmnet with OS trixie
- 19:50 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P90001 and previous config saved to /var/cache/conftool/dbconfig/20260330-195033-fceratto.json
- 19:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2039.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2039.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1107.eqiad.wmnet with OS trixie
- 19:46 eevans@cumin1003: START - Cookbook sre.dns.netbox
- 19:45 eevans@cumin1003: START - Cookbook sre.hosts.move-vlan for host aqs1025
- 19:44 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1025.eqiad.wmnet with OS bullseye
- 19:44 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS trixie
- 19:40 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1106.eqiad.wmnet with OS trixie
- 19:40 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T419635)', diff saved to https://phabricator.wikimedia.org/P90000 and previous config saved to /var/cache/conftool/dbconfig/20260330-194025-fceratto.json
- 19:18 brett: Delete mw-parsoid lvs service via sudo -i cumin A:lvs-low-traffic-eqiad 'ipvsadm --delete-service --tcp-service 10.2.2.92:4452' - T420468
- 19:18 brett: Delete mw-parsoid lvs service via sudo -i cumin A:lvs-secondary-eqiad 'ipvsadm --delete-service --tcp-service 10.2.2.92:4452' - T420468
- 19:16 brett: Delete mw-parsoid lvs service via sudo -i cumin A:lvs-secondary-codfw 'ipvsadm --delete-service --tcp-service 10.2.1.92:4452' - T420468
- 19:15 brett: Delete mw-parsoid lvs service via sudo -i cumin A:lvs-low-traffic-codfw 'ipvsadm --delete-service --tcp-service 10.2.1.92:4452' - T420468
- 19:12 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs (T420468)
- 19:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P89996 and previous config saved to /var/cache/conftool/dbconfig/20260330-191216-fceratto.json
- 19:12 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs (T420468)
- 19:12 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs (T420468)
- 19:11 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs (T420468)
- 19:10 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T420468)
- 19:10 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1024.eqiad.wmnet with reason: host reimage
- 19:08 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T420468)
- 19:06 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs (T420468)
- 19:06 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs (T420468)
- 19:03 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1024.eqiad.wmnet with reason: host reimage
- 19:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T419635)', diff saved to https://phabricator.wikimedia.org/P89995 and previous config saved to /var/cache/conftool/dbconfig/20260330-190208-fceratto.json
- 19:00 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1109.eqiad.wmnet with OS trixie
- 18:54 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2214 (T419635)', diff saved to https://phabricator.wikimedia.org/P89994 and previous config saved to /var/cache/conftool/dbconfig/20260330-185422-fceratto.json
- 18:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 18:52 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aqs1024
- 18:52 eevans@cumin1003: START - Cookbook sre.hosts.move-vlan for host aqs1024
- 18:52 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1024.eqiad.wmnet with OS bullseye
- 18:51 brett@dns1006: END - running authdns-update
- 18:50 brett@dns1006: START - running authdns-update
- 18:46 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 18:46 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T419635)', diff saved to https://phabricator.wikimedia.org/P89993 and previous config saved to /var/cache/conftool/dbconfig/20260330-184635-fceratto.json
- 18:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
- 18:36 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P89992 and previous config saved to /var/cache/conftool/dbconfig/20260330-183626-fceratto.json
- 18:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1109.eqiad.wmnet with reason: host reimage
- 18:31 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-hcaptcha-proxy (exit_code=0) rolling reboot on A:hcaptcha-proxy and not A:hcaptcha-proxy-ulsfo and not P{hcaptcha-proxy7002.wikimedia.org} and A:hcaptcha-proxy
- 18:26 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P89991 and previous config saved to /var/cache/conftool/dbconfig/20260330-182618-fceratto.json
- 18:19 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1106.eqiad.wmnet with OS trixie
- 18:19 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1106.eqiad.wmnet [reason: trixie reimaging]
- 18:18 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS trixie
- 18:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1109.eqiad.wmnet with OS trixie
- 18:16 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T419635)', diff saved to https://phabricator.wikimedia.org/P89990 and previous config saved to /var/cache/conftool/dbconfig/20260330-181611-fceratto.json
- 18:15 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aqs1023.eqiad.wmnet with OS bullseye
- 18:15 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1003"
- 18:15 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2193 (T419635)', diff saved to https://phabricator.wikimedia.org/P89989 and previous config saved to /var/cache/conftool/dbconfig/20260330-181457-fceratto.json
- 18:14 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 18:14 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T419635)', diff saved to https://phabricator.wikimedia.org/P89988 and previous config saved to /var/cache/conftool/dbconfig/20260330-181443-fceratto.json
- 18:07 jforrester@deploy2002: mwscript-k8s job started: maintenance/namespaceDupes.php --wiki=frwiki --source-pseudo-namespace=Abstract_ --fix --move-talk # T420654 abstract: is now an interwiki
- 18:04 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P89987 and previous config saved to /var/cache/conftool/dbconfig/20260330-180434-fceratto.json
- 17:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1109.eqiad.wmnet with OS trixie
- 17:54 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P89986 and previous config saved to /var/cache/conftool/dbconfig/20260330-175427-fceratto.json
- 17:44 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T419635)', diff saved to https://phabricator.wikimedia.org/P89985 and previous config saved to /var/cache/conftool/dbconfig/20260330-174419-fceratto.json
- 17:44 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - eevans@cumin1003"
- 17:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2180 (T419635)', diff saved to https://phabricator.wikimedia.org/P89984 and previous config saved to /var/cache/conftool/dbconfig/20260330-174305-fceratto.json
- 17:42 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 17:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T419635)', diff saved to https://phabricator.wikimedia.org/P89983 and previous config saved to /var/cache/conftool/dbconfig/20260330-174240-fceratto.json
- 17:37 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-hcaptcha-proxy rolling reboot on A:hcaptcha-proxy and not A:hcaptcha-proxy-ulsfo and not P{hcaptcha-proxy7002.wikimedia.org} and A:hcaptcha-proxy
- 17:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P89982 and previous config saved to /var/cache/conftool/dbconfig/20260330-173232-fceratto.json
- 17:29 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1111.*
- 17:29 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aqs1023.eqiad.wmnet with reason: host reimage
- 17:22 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aqs1023.eqiad.wmnet with reason: host reimage
- 17:22 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P89981 and previous config saved to /var/cache/conftool/dbconfig/20260330-172224-fceratto.json
- 17:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1111.eqiad.wmnet with OS trixie
- 17:18 kamila@deploy1003: Finished scap sync-world: Backport for Enable $wgTempCategoryCollations for s3 wikis. (T419274 T419049) (duration: 12m 36s)
- 17:14 kamila@deploy1003: kamila: Continuing with sync
- 17:13 jgreen@dns1004: END - running authdns-update
- 17:12 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T419635)', diff saved to https://phabricator.wikimedia.org/P89980 and previous config saved to /var/cache/conftool/dbconfig/20260330-171215-fceratto.json
- 17:12 jgreen@dns1004: START - running authdns-update
- 17:12 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aqs1023
- 17:12 eevans@cumin1003: START - Cookbook sre.hosts.move-vlan for host aqs1023
- 17:11 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1023.eqiad.wmnet with OS bullseye
- 17:07 kamila@deploy1003: kamila: Backport for Enable $wgTempCategoryCollations for s3 wikis. (T419274 T419049) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:05 kamila@deploy1003: Started scap sync-world: Backport for Enable $wgTempCategoryCollations for s3 wikis. (T419274 T419049)
- 17:03 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2169 (T419635)', diff saved to https://phabricator.wikimedia.org/P89979 and previous config saved to /var/cache/conftool/dbconfig/20260330-170329-fceratto.json
- 17:03 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 17:03 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T419635)', diff saved to https://phabricator.wikimedia.org/P89978 and previous config saved to /var/cache/conftool/dbconfig/20260330-170305-fceratto.json
- 17:00 jayme@cumin1003: END (FAIL) - Cookbook sre.loadbalancer.check-ipip (exit_code=99)
- 17:00 jayme@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 16:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
- 16:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1111.eqiad.wmnet with reason: host reimage
- 16:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P89977 and previous config saved to /var/cache/conftool/dbconfig/20260330-165256-fceratto.json
- 16:42 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P89976 and previous config saved to /var/cache/conftool/dbconfig/20260330-164248-fceratto.json
- 16:38 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS trixie
- 16:37 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1111.eqiad.wmnet with OS trixie
- 16:32 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T419635)', diff saved to https://phabricator.wikimedia.org/P89975 and previous config saved to /var/cache/conftool/dbconfig/20260330-163239-fceratto.json
- 16:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1330.eqiad.wmnet with OS trixie
- 16:23 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2158 (T419635)', diff saved to https://phabricator.wikimedia.org/P89974 and previous config saved to /var/cache/conftool/dbconfig/20260330-162331-fceratto.json
- 16:23 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 16:23 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T419635)', diff saved to https://phabricator.wikimedia.org/P89973 and previous config saved to /var/cache/conftool/dbconfig/20260330-162307-fceratto.json
- 16:13 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P89972 and previous config saved to /var/cache/conftool/dbconfig/20260330-161259-fceratto.json
- 16:11 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1111.eqiad.wmnet with OS trixie
- 16:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1330.eqiad.wmnet with reason: host reimage
- 16:03 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1330.eqiad.wmnet with reason: host reimage
- 16:02 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P89971 and previous config saved to /var/cache/conftool/dbconfig/20260330-160251-fceratto.json
- 15:59 moritzm: rearmed keyholder on netmon* hosts following reboots
- 15:52 fceratto@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T419635)', diff saved to https://phabricator.wikimedia.org/P89970 and previous config saved to /var/cache/conftool/dbconfig/20260330-155242-fceratto.json
- 15:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1330
- 15:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1330
- 15:52 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp6001.*
- 15:52 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp6009.*
- 15:52 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1330
- 15:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1330.eqiad.wmnet 163.48.64.10.in-addr.arpa 3.6.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 15:52 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1330.eqiad.wmnet 163.48.64.10.in-addr.arpa 3.6.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 15:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1330 - ayounsi@cumin1003"
- 15:51 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1330 - ayounsi@cumin1003"
- 15:51 fabfur: repooling cp6001 and cp6009 (T421402)
- 15:47 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp6009*} and A:cp - 3.2 test upgrade ()
- 15:44 elukey: upgrade spicerack on cumin2002
- 15:43 fceratto@cumin1003: dbctl commit (dc=all): 'Depooling db2151 (T419635)', diff saved to https://phabricator.wikimedia.org/P89969 and previous config saved to /var/cache/conftool/dbconfig/20260330-154352-fceratto.json
- 15:43 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 15:42 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp6009*} and A:cp - 3.2 test upgrade ()
- 15:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
- 15:42 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp6001*} and A:cp - 3.2 test upgrade ()
- 15:42 elukey: uploaded spicerack_12.3.0 to apt.wikimedia.org bookworm-wikimedia
- 15:40 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 15:39 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1330
- 15:38 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1330.eqiad.wmnet with OS trixie
- 15:37 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp6001*} and A:cp - 3.2 test upgrade ()
- 15:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
- 15:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
- 15:31 urandom: upgrade cassandra-dev2001 to Cassandra 4.1.11 — T418417
- 15:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
- 15:12 dancy@deploy1003: Finished deploy [releng/jenkins-deploy@6f6a192] (releasing): Grant Overall/Administer to Arnaudb (duration: 01m 01s)
- 15:11 dancy@deploy1003: Started deploy [releng/jenkins-deploy@6f6a192] (releasing): Grant Overall/Administer to Arnaudb
- 15:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1329.eqiad.wmnet with OS trixie
- 15:09 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
- 15:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
- 15:08 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
- 15:07 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
- 15:06 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
- 15:06 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
- 15:06 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
- 15:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
- 15:03 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp6009.*
- 15:02 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp6001.*
- 15:01 fabfur: depooling cp6001 and cp6009 to upgrade haproxy to v 3.2 (T421402)
- 14:58 kamila@deploy1003: Finished scap sync-world: Backport for Revert "Enable $wgTempCategoryCollations for s3 wikis." (duration: 06m 42s)
- 14:54 kamila@deploy1003: kamila: Continuing with sync
- 14:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1329.eqiad.wmnet with reason: host reimage
- 14:54 kamila@deploy1003: kamila: Backport for Revert "Enable $wgTempCategoryCollations for s3 wikis." synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:52 kamila@deploy1003: Started scap sync-world: Backport for Revert "Enable $wgTempCategoryCollations for s3 wikis."
- 14:51 jayme@cumin1003: END (FAIL) - Cookbook sre.loadbalancer.check-ipip (exit_code=99)
- 14:50 cdanis: CIDERGRINDER 🍎 now deployed globally 🚀🌍
- 14:49 jayme@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 14:49 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1329.eqiad.wmnet with reason: host reimage
- 14:40 jayme@cumin1003: END (FAIL) - Cookbook sre.loadbalancer.check-ipip (exit_code=99)
- 14:39 jayme@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 14:37 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1329
- 14:37 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1329
- 14:36 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1329
- 14:36 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1329.eqiad.wmnet 132.32.64.10.in-addr.arpa 2.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:36 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1329.eqiad.wmnet 132.32.64.10.in-addr.arpa 2.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:36 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:36 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1329 - ayounsi@cumin1003"
- 14:36 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1329 - ayounsi@cumin1003"
- 14:32 jclark@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 14:32 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 14:32 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1329
- 14:32 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1329.eqiad.wmnet with OS trixie
- 14:31 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1328.eqiad.wmnet with OS trixie
- 14:30 kamila@deploy1003: Finished scap sync-world: Backport for Enable $wgTempCategoryCollations for s3 wikis. (T419274 T419049) (duration: 09m 59s)
- 14:26 kamila@deploy1003: kamila: Continuing with sync
- 14:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet with OS trixie
- 14:23 cdanis: 💙cdanis@cumin1003.eqiad.wmnet ~ 🕥☕ sudo cumin 'A:cp' 'disable-puppet "cdanis CIDER 🍎"'
- 14:22 kamila@deploy1003: kamila: Backport for Enable $wgTempCategoryCollations for s3 wikis. (T419274 T419049) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:20 kamila@deploy1003: Started scap sync-world: Backport for Enable $wgTempCategoryCollations for s3 wikis. (T419274 T419049)
- 14:18 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section es7
- 14:17 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section es7
- 14:16 jforrester@deploy1003: Finished scap sync-world: Backport for Instrumentation: Track clicks for user account menu experiment (T418053), Display create account button in main menu when user is logged out. (T418053 T415647) (duration: 16m 57s)
- 14:12 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1328.eqiad.wmnet with reason: host reimage
- 14:12 jforrester@deploy1003: emc-wmf, jforrester: Continuing with sync
- 14:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-jumbo1001.eqiad.wmnet with reason: host reimage
- 14:08 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1328.eqiad.wmnet with reason: host reimage
- 14:02 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-jumbo1001.eqiad.wmnet with reason: host reimage
- 14:01 jforrester@deploy1003: emc-wmf, jforrester: Backport for Instrumentation: Track clicks for user account menu experiment (T418053), Display create account button in main menu when user is logged out. (T418053 T415647) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:59 jforrester@deploy1003: Started scap sync-world: Backport for Instrumentation: Track clicks for user account menu experiment (T418053), Display create account button in main menu when user is logged out. (T418053 T415647)
- 13:59 jayme: enabling puppet on A:wikiube-worker-eqiad for T420436
- 13:56 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1328
- 13:56 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1328
- 13:54 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1328
- 13:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1328.eqiad.wmnet 129.32.64.10.in-addr.arpa 9.2.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 13:54 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1328.eqiad.wmnet 129.32.64.10.in-addr.arpa 9.2.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 13:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1328 - ayounsi@cumin1003"
- 13:54 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1328 - ayounsi@cumin1003"
- 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2002.wikimedia.org
- 13:54 jayme@cumin1003: END (FAIL) - Cookbook sre.loadbalancer.check-ipip (exit_code=99)
- 13:52 jayme@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 13:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ganeti-jumbo1001
- 13:51 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti-jumbo1001
- 13:49 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-jumbo1001
- 13:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ganeti-jumbo1001.eqiad.wmnet 140.48.64.10.in-addr.arpa 0.4.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 13:49 bking@cumin2002: START - Cookbook sre.dns.wipe-cache ganeti-jumbo1001.eqiad.wmnet 140.48.64.10.in-addr.arpa 0.4.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 13:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:49 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ganeti-jumbo1001 - bking@cumin2002"
- 13:49 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ganeti-jumbo1001 - bking@cumin2002"
- 13:48 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 13:48 jayme@cumin1003: END (FAIL) - Cookbook sre.loadbalancer.check-ipip (exit_code=99)
- 13:48 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1328
- 13:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt2002.wikimedia.org
- 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast7002.wikimedia.org
- 13:47 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1328.eqiad.wmnet with OS trixie
- 13:47 jayme@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 13:44 jforrester@deploy1003: Sync cancelled.
- 13:43 jforrester@deploy1003: jforrester: Backport for Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T419666) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 13:42 jforrester@deploy1003: Started scap sync-world: Backport for Wikifunctions: Switch cache from mcrouter-wikifunctions to special access (T419666)
- 13:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast7002.wikimedia.org
- 13:40 jforrester@deploy1003: Finished scap sync-world: Backport for instrument(ReviseTone): record start of copyedit session (T419181), Replace WANObjectCache with new MemcachedWrapper concept (T419666), Fix match case for setting minute, week or month TTL on OrchestratorRequest (T421475) (duration: 09m 33s)
- 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1005.eqiad.wmnet
- 13:39 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 13:38 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 13:36 jforrester@deploy1003: jforrester, migr: Continuing with sync
- 13:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1005.eqiad.wmnet
- 13:34 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1004.eqiad.wmnet
- 13:34 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host ganeti-jumbo1001
- 13:33 bking@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-jumbo1001.eqiad.wmnet with OS trixie
- 13:32 jforrester@deploy1003: jforrester, migr: Backport for instrument(ReviseTone): record start of copyedit session (T419181), Replace WANObjectCache with new MemcachedWrapper concept (T419666), Fix match case for setting minute, week or month TTL on OrchestratorRequest (T421475) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can
- 13:31 moritzm: rebalance Ganeti cluster in ulsfo following the completion of the migration to routed Ganeti T421044
- 13:30 jforrester@deploy1003: Started scap sync-world: Backport for instrument(ReviseTone): record start of copyedit session (T419181), Replace WANObjectCache with new MemcachedWrapper concept (T419666), Fix match case for setting minute, week or month TTL on OrchestratorRequest (T421475)
- 13:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1004.eqiad.wmnet
- 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
- 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd1003.eqiad.wmnet
- 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
- 13:17 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Add APCu cache layer to health checker (T421204 T412947) (duration: 11m 56s)
- 13:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd1003.eqiad.wmnet
- 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
- 13:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
- 13:09 kharlan@deploy1003: kharlan: Continuing with sync
- 13:07 kharlan@deploy1003: kharlan: Backport for hCaptcha: Add APCu cache layer to health checker (T421204 T412947) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
- 13:05 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Add APCu cache layer to health checker (T421204 T412947)
- 13:05 jayme: disabling puppet on A:wikiube-worker-eqiad for T420436
- 12:34 moritzm: failover Ganeti master in ulsfo to ganeti4008
- 12:03 topranks: apply transport-in policy to core router transport peerings to prefer local anycast routes
- 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
- 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
- 11:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
- 11:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
- 11:51 godog: bounce neutron-l3-agent on cloundnet1005 - T421054
- 11:06 btullis@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 11:05 btullis@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 11:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:04 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS bookworm
- 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage
- 10:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage
- 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS bookworm
- 09:42 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast4006.wikimedia.org with OS trixie
- 09:19 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
- 09:17 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 42
- 09:15 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
- 09:14 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 12200
- 09:11 tappof: prometheus[12]008: reboot (T419960)
- 09:10 tappof: prometheus[12]006: reboot (T419960)
- 08:56 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie
- 08:52 XioNoX: push pfw policy - T421556
- 08:51 tappof: prometheus[12]007: reboot (T419960)
- 08:38 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 08:38 javiermonton@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 08:37 tappof: prometheus[12]005: reboot (T419960)
- 08:34 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast4006.wikimedia.org with OS trixie
- 08:17 javiermonton@deploy1003: Finished scap sync-world: Backport for stream: mediawiki.page_html_content_change (T421341) (duration: 35m 10s)
- 08:03 javiermonton@deploy1003: javiermonton: Continuing with sync
- 08:00 javiermonton@deploy1003: javiermonton: Backport for stream: mediawiki.page_html_content_change (T421341) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:54 godog: deploy rabbitmq changes to allow cli communication - T420923
- 07:48 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie
- 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host bast4006.wikimedia.org with OS trixie
- 07:48 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie
- 07:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast4006.wikimedia.org with OS trixie
- 07:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie
- 07:42 javiermonton@deploy1003: Started scap sync-world: Backport for stream: mediawiki.page_html_content_change (T421341)
- 07:38 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast4006.wikimedia.org with OS trixie
- 07:24 tappof: prometheus7002: switch to nftables and reboot (T419960)
- 07:18 tappof: prometheus6002: switch to nftables and reboot (T419960)
- 07:11 tappof: prometheus5002: switch to nftables and reboot (T419960)
- 07:08 tappof: prometheus4003: reboot (T419960)
- 07:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS trixie
- 07:04 tappof: prometheus3004: switch to nftables and reboot (T419960)
- 05:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1022.eqiad.wmnet with reason: Downgrade clouddb1022 to 10.11.13
- 05:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1022.eqiad.wmnet with reason: Downgrade clouddb1022 to 10.11.13
- 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 50s)
- 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
2026-03-29
- 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 13s)
- 02:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
2026-03-28
- 14:48 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on zuul2002.codfw.wmnet with reason: T421398
- 14:48 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: T421398
- 14:16 mutante: releases1003 - re-enabled puppet which was disabled due to T418109 but should not have been disabled during switch of the deployment server; leading to T421532
2026-03-27
- 18:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 18:00 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:50 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:40 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:39 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 17:39 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:39 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 17:38 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 17:37 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 17:37 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 17:35 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 17:34 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 17:34 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:30 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:30 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:24 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:19 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:15 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 17:04 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:55 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:50 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:47 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:42 dancy@deploy1003: Finished deploy [releng/jenkins-deploy@31ace7e] (releasing): (no justification provided) (duration: 01m 18s)
- 16:41 dancy@deploy1003: Started deploy [releng/jenkins-deploy@31ace7e] (releasing): (no justification provided)
- 16:37 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:36 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:22 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:13 brouberol@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 16:12 brouberol@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 16:12 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:11 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:10 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:09 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:09 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change ips for frack servers - cmooney@cumin1003"
- 14:08 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change ips for frack servers - cmooney@cumin1003"
- 14:02 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 13:52 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:51 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:49 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:49 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:48 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 13:47 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 12:47 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 12:45 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 12:41 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 12:40 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 12:14 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:11 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 12:10 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 12:08 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 12:06 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 11:53 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 11:51 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 11:50 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 11:48 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 11:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:27 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:15 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-test1006.eqiad.wmnet with OS trixie
- 11:15 taavi@cumin1003: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database abstractwiki (T420637)
- 11:02 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
- 10:54 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2006.codfw.wmnet
- 10:51 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
- 10:50 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host poolcounter2006.codfw.wmnet
- 10:46 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter2005.codfw.wmnet
- 10:43 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host poolcounter2005.codfw.wmnet
- 10:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1006.eqiad.wmnet with reason: host reimage
- 10:27 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1006.eqiad.wmnet with reason: host reimage
- 10:18 taavi@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database abstractwiki (T420637)
- 10:12 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host kafka-test1006.eqiad.wmnet with OS trixie
- 10:04 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 10:03 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:01 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:00 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:58 dpogorzelski@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:57 elukey@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
- 09:37 elukey@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
- 09:06 elukey@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 09:05 elukey@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 09:04 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 09:03 elukey@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 08:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 08:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 08:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 08:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 08:05 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 08:04 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 08:02 elukey@deploy1003: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 07:46 elukey@deploy1003: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 03:06 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 02:32 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 02:12 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 02:07 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 06m 07s)
- 02:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
- 01:30 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: T421330
- 01:30 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on zuul1001.eqiad.wmnet with reason: T421330
- 01:30 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on zuul2001.codfw.wmnet with reason: T421330
- 01:29 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on zuul2002.codfw.wmnet with reason: T421330
- 01:12 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
2026-03-26
- 21:35 reedy@deploy1003: Finished scap sync-world: Backport for Add Logstash logging for successful passwordless logins, InitialiseSettings: Remove apiportalwiki from $wmgCentralAuthAutoLoginWikis (T421413) (duration: 06m 58s)
- 21:31 reedy@deploy1003: catrope, reedy: Continuing with sync
- 21:30 reedy@deploy1003: catrope, reedy: Backport for Add Logstash logging for successful passwordless logins, InitialiseSettings: Remove apiportalwiki from $wmgCentralAuthAutoLoginWikis (T421413) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:28 reedy@deploy1003: Started scap sync-world: Backport for Add Logstash logging for successful passwordless logins, InitialiseSettings: Remove apiportalwiki from $wmgCentralAuthAutoLoginWikis (T421413)
- 21:00 suecarmol@deploy1003: Finished scap sync-world: Backport for PersonalDashboard: Add config for Active Discussions (T420785) (duration: 13m 53s)
- 20:54 suecarmol@deploy1003: suecarmol: Continuing with sync
- 20:51 suecarmol@deploy1003: suecarmol: Backport for PersonalDashboard: Add config for Active Discussions (T420785) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:46 suecarmol@deploy1003: Started scap sync-world: Backport for PersonalDashboard: Add config for Active Discussions (T420785)
- 20:44 kamila@deploy1003: Finished scap sync-world: Backport for Wrap 'centralauthtoken' in a JWT (T420280), Enable $wgTempCategoryCollations for testwiki. (T419274 T419049) (duration: 37m 32s)
- 20:30 kamila@deploy1003: matmarex, kamila: Continuing with sync
- 20:25 kamila@deploy1003: matmarex, kamila: Backport for Wrap 'centralauthtoken' in a JWT (T420280), Enable $wgTempCategoryCollations for testwiki. (T419274 T419049) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2039.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 20:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2039.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 20:08 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on P{aqs1015.eqiad.wmnet} and P{P:Cassandra}
- 20:08 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host restbase2039
- 20:07 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host restbase2039
- 20:06 kamila@deploy1003: Started scap sync-world: Backport for Wrap 'centralauthtoken' in a JWT (T420280), Enable $wgTempCategoryCollations for testwiki. (T419274 T419049)
- 20:05 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding restbase2039 to codfw - jhancock@cumin2002"
- 20:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding restbase2039 to codfw - jhancock@cumin2002"
- 20:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:00 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on P{aqs1015.eqiad.wmnet} and P{P:Cassandra}
- 19:47 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs6003.drmrs.wmnet} and A:liberica
- 19:44 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6003.drmrs.wmnet} and A:liberica
- 18:51 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs4008.ulsfo.wmnet} and A:liberica
- 18:48 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs4008.ulsfo.wmnet} and A:liberica
- 18:42 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
- 18:42 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/zotero: apply
- 18:42 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 18:41 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 18:41 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
- 18:40 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/toolhub: apply
- 18:40 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
- 18:40 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs4009.ulsfo.wmnet} and A:liberica
- 18:39 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/termbox: apply
- 18:39 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 18:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 18:38 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 18:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 18:38 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:38 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:38 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 18:37 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 18:37 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 18:37 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 18:36 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 18:36 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs4009.ulsfo.wmnet} and A:liberica
- 18:36 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 18:36 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
- 18:35 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
- 18:35 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: apply
- 18:35 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: apply
- 18:34 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 18:34 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 18:33 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
- 18:33 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
- 18:32 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/proton: apply
- 18:31 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/proton: apply
- 18:31 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
- 18:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
- 18:30 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 18:30 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 18:30 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 18:28 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 18:28 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
- 18:28 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
- 18:27 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 18:25 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on P{sessionstore1006.eqiad.wmnet} and P{P:Cassandra}
- 18:21 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 18:21 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 18:20 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 18:20 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 18:19 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 18:19 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
- 18:19 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
- 18:18 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on P{sessionstore1006.eqiad.wmnet} and P{P:Cassandra}
- 18:18 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
- 18:17 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
- 18:17 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
- 18:16 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
- 18:15 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
- 18:14 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
- 18:13 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
- 18:13 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
- 18:13 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
- 18:12 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
- 18:12 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 18:12 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 18:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
- 18:11 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
- 18:11 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
- 18:11 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
- 18:10 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
- 18:09 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
- 18:09 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
- 18:08 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply
- 18:08 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
- 18:08 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
- 18:08 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 18:07 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 18:07 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
- 18:07 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
- 18:07 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 18:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 18:06 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
- 18:06 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
- 18:04 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 18:04 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 18:03 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:03 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:03 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 18:02 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 17:59 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
- 17:58 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/apertium: apply
- 17:55 swfrench@deploy1003: Finished scap sync-world: helmfile-only deployment to enable envoy drain on remaining services - T364245 (duration: 05m 31s)
- 17:52 swfrench@deploy1003: Started scap sync-world: helmfile-only deployment to enable envoy drain on remaining services - T364245
- 17:35 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 16:39 rzl@deploy1003: Finished scap sync-world: https://gerrit.wikimedia.org/r/1256396 T420666 (duration: 11m 21s)
- 16:35 rzl@deploy1003: rzl: Continuing with sync
- 16:34 rzl@deploy1003: rzl: https://gerrit.wikimedia.org/r/1256396 T420666 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:31 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1256396 T420666
- 16:27 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 16:17 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 16:17 blake@deploy1003: Finished scap sync-world: Test deployment to validate deployment server switchover - T413974 (duration: 31m 09s)
- 16:16 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 16:05 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 15:47 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 15:47 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1253: Pool db1253.eqiad.wmnet in after cloning
- 15:46 blake@deploy1003: Started scap sync-world: Test deployment to validate deployment server switchover - T413974
- 15:44 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 15:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on zuul2002.codfw.wmnet with reason: T421330
- 15:43 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: T421330
- 15:33 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 15:30 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
- 15:30 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
- 15:23 blake@dns1004: END - running authdns-update
- 15:22 bjensen: updating dns for the deployment host switchover
- 15:21 blake@dns1004: START - running authdns-update
- 15:19 blake@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases2003.codfw.wmnet,releases1003.eqiad.wmnet with reason: Deployment server switchover
- 15:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:01 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1253: Pool db1253.eqiad.wmnet in after cloning
- 14:39 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:28 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:23 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:22 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:21 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:21 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:21 elukey@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:20 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 14:20 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 14:20 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 14:19 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 14:18 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 14:18 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:17 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 14:17 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:17 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 14:17 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 14:01 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1202: Pool db1202.eqiad.wmnet in after cloning
- 13:57 jynus: dropping ms-backup[12]00[12] grants from backup1-* dbs T420464
- 13:56 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1070.eqiad.wmnet
- 13:55 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1070.eqiad.wmnet
- 13:55 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1097.eqiad.wmnet
- 13:55 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1097.eqiad.wmnet
- 13:54 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1055.eqiad.wmnet
- 13:53 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1055.eqiad.wmnet
- 13:46 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 13:45 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 13:40 sergi0: UTC afternoon backport window done
- 13:39 sgimeno@deploy2002: Finished scap sync-world: Backport for GrowthExperiments: scale edit and thanks query limit to more wikis (T341599) (duration: 09m 17s)
- 13:35 sgimeno@deploy2002: sgimeno: Continuing with sync
- 13:32 sgimeno@deploy2002: sgimeno: Backport for GrowthExperiments: scale edit and thanks query limit to more wikis (T341599) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:30 sgimeno@deploy2002: Started scap sync-world: Backport for GrowthExperiments: scale edit and thanks query limit to more wikis (T341599)
- 13:26 jforrester@deploy2002: Finished deploy [integration/docroot@f021d3f]: Ia936ec (duration: 00m 11s)
- 13:26 jforrester@deploy2002: Started deploy [integration/docroot@f021d3f]: Ia936ec
- 13:24 kamila@deploy2002: Finished scap sync-world: Backport for Temporarily add shellbox-icu to $wgShellboxUrls (T419049 T419242 T419274) (duration: 07m 16s)
- 13:20 kamila@deploy2002: kamila: Continuing with sync
- 13:19 kamila@deploy2002: kamila: Backport for Temporarily add shellbox-icu to $wgShellboxUrls (T419049 T419242 T419274) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:17 kamila@deploy2002: Started scap sync-world: Backport for Temporarily add shellbox-icu to $wgShellboxUrls (T419049 T419242 T419274)
- 13:15 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1202: Pool db1202.eqiad.wmnet in after cloning
- 13:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 13:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 13:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 13:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 13:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 13:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 13:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 13:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 13:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 13:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 13:13 kamila@deploy2002: Finished scap sync-world: Backport for cswiki: lift IP cap for editathon (T421305) (duration: 07m 22s)
- 13:12 btullis@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-dse-aa,name=eqiad
- 13:09 kamila@deploy2002: kamila, anzx: Continuing with sync
- 13:08 jynus: deploying new grants for new ms-backup hosts and removing old ones T420464
- 13:08 kamila@deploy2002: kamila, anzx: Backport for cswiki: lift IP cap for editathon (T421305) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:06 kamila@deploy2002: Started scap sync-world: Backport for cswiki: lift IP cap for editathon (T421305)
- 13:03 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:43 cdanis: puppet reenabled on drmrs, CIDERGRINDER deployed
- 12:39 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:39 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:23 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 12:12 cdanis: 💔cdanis@cumin1003.eqiad.wmnet ~ 🕗☕ sudo cumin 'A:cp-drmrs' 'disable-puppet "cdanis CIDER"'
- 12:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1004.eqiad.wmnet
- 12:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1006.eqiad.wmnet
- 12:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1003.eqiad.wmnet
- 12:02 elukey@cumin1003: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
- 12:01 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet
- 12:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1005.eqiad.wmnet
- 11:51 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1006.eqiad.wmnet
- 11:51 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1005.eqiad.wmnet
- 11:51 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1004.eqiad.wmnet
- 11:51 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1003.eqiad.wmnet
- 11:51 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1002.eqiad.wmnet
- 11:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1001.eqiad.wmnet
- 11:44 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:43 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:43 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:42 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:42 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:42 jclark@cumin1003: START - Cookbook sre.hosts.provision for host kafka-logging1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:41 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:41 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 11:41 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 11:41 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1001.eqiad.wmnet
- 11:38 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
- 11:37 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 11:33 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
- 11:32 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1202: Depool db1202.eqiad.wmnet to then clone it to db1253.eqiad.wmnet - fceratto@cumin1003
- 11:31 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 11:31 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1202: Depool db1202.eqiad.wmnet to then clone it to db1253.eqiad.wmnet - fceratto@cumin1003
- 11:31 fceratto@cumin1003: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
- 11:31 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 11:24 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet
- 11:22 elukey@cumin1003: START - Cookbook sre.postgresql.postgres-init
- 11:22 elukey@cumin1003: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
- 11:22 elukey@cumin1003: START - Cookbook sre.postgresql.postgres-init
- 11:19 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
- 11:15 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
- 11:14 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
- 11:14 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
- 11:13 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
- 11:07 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-codfw
- 11:04 dreamyjazz@deploy2002: Finished scap sync-world: Backport for SI: Enable on bnwiki, itwiki, simplewiki, and plwiki (T415529) (duration: 09m 23s)
- 10:59 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 10:56 dreamyjazz@deploy2002: dreamyjazz: Backport for SI: Enable on bnwiki, itwiki, simplewiki, and plwiki (T415529) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 10:54 dreamyjazz@deploy2002: Started scap sync-world: Backport for SI: Enable on bnwiki, itwiki, simplewiki, and plwiki (T415529)
- 10:33 oblivian@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
- 10:32 oblivian@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
- 10:32 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 10:32 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 10:31 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 10:31 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 10:31 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 10:31 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 10:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 10:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 10:23 tappof@cumin1003: END (PASS) - Cookbook sre.o11y.thanos-compact-restart (exit_code=0)
- 10:23 tappof@cumin1003: START - Cookbook sre.o11y.thanos-compact-restart
- 10:22 tappof@cumin1003: END (PASS) - Cookbook sre.o11y.thanos-compact-restart (exit_code=0)
- 10:22 tappof@cumin1003: START - Cookbook sre.o11y.thanos-compact-restart
- 10:12 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s1
- 10:11 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s1
- 10:05 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s4
- 10:05 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s4
- 09:58 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s8
- 09:58 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s8
- 09:53 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-codfw
- 09:53 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 09:52 hashar: Starting Gerrit on the replica / gerrit1003
- 09:51 hashar: Stopping Gerrit on the replica / gerrit1003 to clear web sessions
- 09:51 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s7
- 09:50 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s7
- 09:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 09:49 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 09:47 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 09:46 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - T421278
- 09:45 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 09:44 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s3
- 09:43 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s3
- 09:42 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 09:36 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s2
- 09:36 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s2
- 09:31 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 09:29 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s5
- 09:29 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 09:29 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s5
- 09:26 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 09:22 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s6
- 09:22 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 09:22 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s6
- 09:18 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 09:16 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section es6
- 09:15 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section es6
- 09:13 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 09:11 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 09:10 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 09:08 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section x3
- 09:07 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section x3
- 09:05 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 09:02 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section x1
- 09:01 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section x1
- 09:01 federico3: starting T416708 - disabling circular replication on core dbs
- 08:56 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 08:51 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 08:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 08:43 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 08:41 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 08:32 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - T421278
- 08:27 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T421278
- 08:18 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T421278
- 08:11 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.21 refs T420479
- 05:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1022.eqiad.wmnet with reason: Downgrade clouddb1022 to 10.11.13
2026-03-25
- 23:59 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on zuul2001.codfw.wmnet with reason: T421330
- 23:58 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on zuul1001.eqiad.wmnet with reason: T421330
- 23:29 mutante: zuul1001 - installed mariadb-client - connected once to zuul db on m1-master; mysql> truncate "alembic_version"; - systemctl restart zuul-web - This fixed the zuul-web service. finally no error in systemctl status. (T405119)
- 21:38 ryankemper: [opensearch-k8s] T414484 Depooled eqiad; change verified working (now when I do `host k8s-ingress-dse-aa.discovery.wmnet` from `cumin1003`, and then reverse-lookup the resulting IP, I get a codfw address); so traffic is now routing to dse-k8s-codfw
- 21:35 ryankemper@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-dse-aa,name=eqiad
- 21:30 Dreamy_Jazz: Created cusi_case, cusi_user, and cusi_signal on bnwiki, itwiki, simplewiki, plwiki for T415529
- 21:27 ryankemper: [opensearch-k8s] T414484 Getting ready to depool `dnsdisc=k8s-ingress-dse-aa,name=eqiad`, leaving codfw pooled. This will get us ready for a full rolling-upgrade of the dse-k8s-eqiad cluster tomorrow.
- 21:23 Dreamy_Jazz: Evening UTC backport window done
- 21:08 kharlan@deploy2002: Finished scap sync-world: Backport for SuggestedInvestigations: Import session into signal matching job (T421062) (duration: 10m 26s)
- 21:04 kharlan@deploy2002: kharlan: Continuing with sync
- 21:01 kharlan@deploy2002: kharlan: Backport for SuggestedInvestigations: Import session into signal matching job (T421062) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:58 kharlan@deploy2002: Started scap sync-world: Backport for SuggestedInvestigations: Import session into signal matching job (T421062)
- 20:51 eevans@cumin1003: END (ERROR) - Cookbook sre.cassandra.roll-reboot (exit_code=97) rolling reboot on P{sessionstore[1004-1006].eqiad.wmnet} and P{P:Cassandra}
- 20:43 aaron@deploy2002: Finished scap sync-world: Backport for Add Analytics APIs to the RestSandbox (T419429) (duration: 08m 33s)
- 20:38 aaron@deploy2002: aaron: Continuing with sync
- 20:36 aaron@deploy2002: aaron: Backport for Add Analytics APIs to the RestSandbox (T419429) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:34 aaron@deploy2002: Started scap sync-world: Backport for Add Analytics APIs to the RestSandbox (T419429)
- 20:30 jdlrobson@deploy2002: Finished scap sync-world: Backport for Deploy temporary accounts to ruwiki (T413771) (duration: 11m 04s)
- 20:25 jdlrobson@deploy2002: stran, jdlrobson: Continuing with sync
- 20:21 jdlrobson@deploy2002: stran, jdlrobson: Backport for Deploy temporary accounts to ruwiki (T413771) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:19 jdlrobson@deploy2002: Started scap sync-world: Backport for Deploy temporary accounts to ruwiki (T413771)
- 20:17 jdlrobson@deploy2002: Finished scap sync-world: Backport for Close the legacy-vector dblist (T421289) (duration: 07m 42s)
- 20:14 aokoth@cumin1003: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T421278
- 20:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T421278
- 20:13 aokoth@cumin1003: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T421278
- 20:13 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T421278
- 20:12 jdlrobson@deploy2002: jdlrobson: Continuing with sync
- 20:12 aokoth@cumin1003: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T421278
- 20:12 jdlrobson@deploy2002: jdlrobson: Backport for Close the legacy-vector dblist (T421289) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:09 jdlrobson@deploy2002: Started scap sync-world: Backport for Close the legacy-vector dblist (T421289)
- 20:05 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-hcaptcha-proxy (exit_code=0) rolling reboot on P{hcaptcha-proxy7002.wikimedia.org} and A:hcaptcha-proxy
- 20:01 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-hcaptcha-proxy rolling reboot on P{hcaptcha-proxy7002.wikimedia.org} and A:hcaptcha-proxy
- 20:00 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on P{sessionstore[1004-1006].eqiad.wmnet} and P{P:Cassandra}
- 19:34 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
- 19:30 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
- 19:26 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - T421278
- 19:24 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on P{sessionstore[2004-2006].codfw.wmnet} and P{P:Cassandra}
- 19:20 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2011.codfw.wmnet
- 19:17 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs2011.codfw.wmnet
- 19:17 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - T421278
- 19:14 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2011.codfw.wmnet with reason: Planned reboot
- 19:11 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T421278
- 19:11 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs4010.ulsfo.wmnet} and A:liberica
- 19:07 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs4010.ulsfo.wmnet} and A:liberica
- 19:00 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2012.codfw.wmnet
- 18:57 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs2012.codfw.wmnet
- 18:53 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on P{sessionstore[2004-2006].codfw.wmnet} and P{P:Cassandra}
- 18:51 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
- 18:51 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
- 18:50 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 18:50 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 18:49 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 18:49 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 18:49 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
- 18:48 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
- 18:48 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
- 18:47 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
- 18:47 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 18:47 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 18:47 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 18:46 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 18:46 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:46 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:46 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 18:46 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2012.codfw.wmnet with reason: Planned reboot
- 18:46 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 18:46 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 18:45 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 18:45 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 18:44 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 18:44 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
- 18:44 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
- 18:44 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: apply
- 18:44 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/recommendation-api: apply
- 18:43 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 18:43 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 18:43 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
- 18:42 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
- 18:42 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
- 18:41 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
- 18:41 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs7001.magru.wmnet} and A:liberica
- 18:40 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
- 18:40 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
- 18:39 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 18:39 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 18:37 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 18:37 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 18:37 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs7001.magru.wmnet} and A:liberica
- 18:37 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 18:35 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 18:34 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
- 18:34 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
- 18:33 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 18:29 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs7002.magru.wmnet} and A:liberica
- 18:28 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 18:28 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 18:26 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 18:26 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
- 18:26 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
- 18:25 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases2003.codfw.wmnet with reason: debug java install
- 18:25 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs7002.magru.wmnet} and A:liberica
- 18:25 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases1003.eqiad.wmnet with reason: debug java install
- 18:25 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
- 18:24 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
- 18:23 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
- 18:23 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
- 18:23 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
- 18:22 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
- 18:22 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
- 18:22 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
- 18:21 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
- 18:21 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
- 18:20 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
- 18:20 mutante: releases1003 - apt-get upgrade - envoyproxy, python3-wmflib
- 18:20 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
- 18:20 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 18:19 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 18:19 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
- 18:18 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
- 18:18 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
- 18:18 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
- 18:17 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
- 18:17 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
- 18:16 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
- 18:16 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/echostore: apply
- 18:15 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
- 18:15 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
- 18:15 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 18:14 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 18:14 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
- 18:14 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
- 18:14 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 18:13 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 18:13 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
- 18:13 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
- 18:12 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 18:12 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 18:11 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 18:11 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 18:11 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 18:11 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 18:09 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
- 18:09 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/apertium: apply
- 17:29 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
- 17:29 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
- 17:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-fr-tech: apply
- 17:23 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on releases1003.eqiad.wmnet with reason: debug java install
- 17:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-fr-tech: apply
- 16:44 ebysans@deploy2002: Finished deploy [analytics/refinery@80c527b] (thin): Regular analytics weekly train THIN [analytics/refinery@80c527b6] (duration: 01m 59s)
- 16:42 ebysans@deploy2002: Started deploy [analytics/refinery@80c527b] (thin): Regular analytics weekly train THIN [analytics/refinery@80c527b6]
- 16:42 SandraEbele_: Deploying Refinery as part of weekly deployment train
- 16:41 ebysans@deploy2002: Finished deploy [analytics/refinery@80c527b]: Regular analytics weekly train [analytics/refinery@80c527b6] (duration: 04m 32s)
- 16:37 ebysans@deploy2002: Started deploy [analytics/refinery@80c527b]: Regular analytics weekly train [analytics/refinery@80c527b6]
- 16:22 ebysans@deploy2002: Finished deploy [analytics/refinery@80c527b] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@80c527b6] (duration: 01m 58s)
- 16:22 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 16:21 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 16:21 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 16:20 ebysans@deploy2002: Started deploy [analytics/refinery@80c527b] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@80c527b6]
- 16:20 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 16:19 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 16:18 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 16:18 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 16:06 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 16:05 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 16:05 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 16:04 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 16:03 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 16:02 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 16:02 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 16:01 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:51 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:51 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 15:50 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 15:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:49 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:49 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:48 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:42 blake@deploy2002: Finished scap sync-world: Backport for debug: reorder debug backends for eqiad switchover (T413974) (duration: 07m 41s)
- 15:41 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:37 blake@deploy2002: blake: Continuing with sync
- 15:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:37 blake@deploy2002: blake: Backport for debug: reorder debug backends for eqiad switchover (T413974) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:34 blake@deploy2002: Started scap sync-world: Backport for debug: reorder debug backends for eqiad switchover (T413974)
- 15:34 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:32 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.09-unlock-scap (exit_code=0) for datacenter switchover from codfw to eqiad
- 15:32 root@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter switchover from codfw to eqiad - (duration: 91m 45s)
- 15:32 root@deploy2002: Forcefully removing global lock: Datacenter switchover from codfw to eqiad -
- 15:32 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.09-unlock-scap for datacenter switchover from codfw to eqiad
- 15:31 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0) for datacenter switchover from codfw to eqiad
- 15:26 blake@dns1004: END - running authdns-update
- 15:24 blake@dns1004: START - running authdns-update
- 15:24 elukey@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
- 15:23 elukey@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
- 15:18 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters for datacenter switchover from codfw to eqiad
- 15:18 blake@dns1004: END - running authdns-update
- 15:16 blake@dns1004: START - running authdns-update
- 15:14 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0) for datacenter switchover from codfw to eqiad
- 15:13 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl for datacenter switchover from codfw to eqiad
- 15:11 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) for datacenter switchover from codfw to eqiad
- 15:10 root@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:09 root@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:08 root@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 15:07 root@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 15:07 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from codfw to eqiad
- 15:07 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-mw-jobrunner (exit_code=0) for datacenter switchover from codfw to eqiad
- 15:07 root@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: sync
- 15:07 root@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: sync
- 15:07 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.08-restart-mw-jobrunner for datacenter switchover from codfw to eqiad
- 15:02 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) for datacenter switchover from codfw to eqiad
- 15:02 blake@cumin1003: MediaWiki read-only period ends at: 2026-03-25 15:02:52.921926
- 14:55 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) for datacenter switchover from codfw to eqiad
- 14:53 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from codfw to eqiad
- 14:52 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0) for datacenter switchover from codfw to eqiad
- 14:52 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks for datacenter switchover from codfw to eqiad
- 14:51 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0) for datacenter switchover from codfw to eqiad
- 14:46 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl for datacenter switchover from codfw to eqiad
- 14:28 cdanis: 💙cdanis@apt1002.wikimedia.org ~ 🕥☕ sudo -i reprepro --component main --restrict cidergrinder update trixie-wikimedia
- 14:28 cdanis: 💙cdanis@apt1002.wikimedia.org ~ 🕥☕ sudo -i reprepro --component main --restrict cidergrinder update bullseye-wikimedia
- 14:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['phab2002']
- 14:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['phab2002']
- 14:14 gengh@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:13 gengh@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:13 gengh@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:12 gengh@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:11 gengh@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:11 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:11 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:11 gengh@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:10 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:10 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:08 gengh@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:07 gengh@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:07 gengh@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:07 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:06 gengh@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:06 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:06 gengh@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:05 gengh@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:00 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.00-lock-scap (exit_code=0) for datacenter switchover from codfw to eqiad
- 14:00 root@deploy2002: Locking from deployment [ALL REPOSITORIES]: Datacenter switchover from codfw to eqiad -
- 14:00 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.00-lock-scap for datacenter switchover from codfw to eqiad
- 13:49 otto@deploy2002: Finished scap sync-world: Backport for EventStreamConfig - Increase spark_job_ingestion_scale for larger event streams (T360794 T351225) (duration: 07m 48s)
- 13:45 otto@deploy2002: otto: Continuing with sync
- 13:45 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 13:44 otto@deploy2002: otto: Backport for EventStreamConfig - Increase spark_job_ingestion_scale for larger event streams (T360794 T351225) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:42 otto@deploy2002: Started scap sync-world: Backport for EventStreamConfig - Increase spark_job_ingestion_scale for larger event streams (T360794 T351225)
- 13:32 awight@deploy2002: Finished scap sync-world: Backport for [beta] Kill synthetic refs with feature flag (T421055), idwiki: Remove unused user groups on Indonesian Wikipedia (T419105), ptwiki: Enable block action for the abuse filter (T419312), ptwiki: Add suppressredirect to autoreviewer and rollbacker user groups (T420704) (duration: 11m 33s)
- 13:29 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:29 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:27 awight@deploy2002: codenamenoreste, awight, gerrit-patch-uploader: Continuing with sync
- {{safesubst:SAL entry|1=13:23 awight@deploy2002: codenamenoreste, awight, gerrit-patch-uploader: Backport for [beta] Kill synthetic refs with feature flag (T421055), idwiki: Remove unused user groups on Indonesian Wikipedia (T419105), ptwiki: Enable block action for the abuse filter (T419312), [[gerrit:1256748|ptwiki: Add suppressredirect to autoreviewer and rollbacker user groups (T420704)]}}
- 13:20 awight@deploy2002: Started scap sync-world: Backport for [beta] Kill synthetic refs with feature flag (T421055), idwiki: Remove unused user groups on Indonesian Wikipedia (T419105), ptwiki: Enable block action for the abuse filter (T419312), ptwiki: Add suppressredirect to autoreviewer and rollbacker user groups (T420704)
- 13:17 dcausse@deploy2002: Finished scap sync-world: Backport for Revert^2 "search: use the discovery ns record for the semanticsearch cluster" (duration: 10m 20s)
- 13:12 dcausse@deploy2002: dcausse: Continuing with sync
- 13:09 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:09 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:09 dcausse@deploy2002: dcausse: Backport for Revert^2 "search: use the discovery ns record for the semanticsearch cluster" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:07 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:06 dcausse@deploy2002: Started scap sync-world: Backport for Revert^2 "search: use the discovery ns record for the semanticsearch cluster"
- 13:06 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:02 XioNoX: Inter.Link - DDoS - Activation of automatic reroute
- 12:56 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 12:55 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 12:51 marostegui@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1022.eqiad.wmnet,service=s3
- 12:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1022.eqiad.wmnet with reason: Downgrade clouddb1022 to 10.11.15
- 12:41 marostegui@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1022.eqiad.wmnet,service=s3
- 12:40 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
- 12:40 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-coord1002.eqiad.wmnet
- 12:38 mszwarc@deploy2002: mwscript-k8s job started: foreachwikiindblist all demoteIneligibleUsers.php --relay-log checkuser=metawiki --relay-log suppress=metawiki # T418580
- 12:34 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-test-coord1002.eqiad.wmnet
- 12:34 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
- 12:33 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 12:32 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1028.eqiad.wmnet
- 12:25 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host wdqs1028.eqiad.wmnet
- 12:24 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 12:19 mszwarc@deploy2002: Finished scap sync-world: Backport for Allow for demoting 2FA-less members of further 6 groups (T418580) (duration: 10m 23s)
- 12:13 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
- 12:12 mszwarc@deploy2002: mszwarc: Continuing with sync
- 12:11 mszwarc@deploy2002: mszwarc: Backport for Allow for demoting 2FA-less members of further 6 groups (T418580) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:09 mszwarc@deploy2002: Started scap sync-world: Backport for Allow for demoting 2FA-less members of further 6 groups (T418580)
- 12:07 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
- 12:07 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl2002.codfw.wmnet
- 12:02 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl2002.codfw.wmnet
- 11:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl2001.codfw.wmnet
- 11:53 marostegui: Restart clouddb1022:s3 to enable error_log T420177
- 11:51 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl2001.codfw.wmnet
- 11:51 jayme: migrated wikikube apiservers (eqiad and codfw) to IPIP - T420436
- 11:49 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: wikikube-master-codfw@codfw
- 11:49 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 11:48 jayme@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 11:46 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: wikikube-master-eqiad@eqiad
- 11:46 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 11:45 jayme@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 11:45 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet
- 11:43 jayme@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-master-codfw@codfw
- 11:41 jayme@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-master-eqiad@eqiad
- 11:40 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1002.eqiad.wmnet
- 11:38 mvernon@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: apply
- 11:36 mvernon@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: apply
- 11:21 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1022.eqiad.wmnet,service=s3
- 11:18 mvernon@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: apply
- 11:16 mvernon@deploy2002: helmfile [codfw] DONE helmfile.d/services/tegola-vector-tiles: apply
- 11:15 mvernon@deploy2002: helmfile [codfw] START helmfile.d/services/tegola-vector-tiles: apply
- 11:15 mvernon@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
- 11:14 mvernon@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
- 11:07 mvernon@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: apply
- 11:07 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis abstractwiki in section s5
- 11:07 mvernon@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: apply
- 11:05 mvernon@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: apply
- 10:55 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 10:53 fceratto@cumin1003: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis abstractwiki in section s5
- 10:45 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1022.eqiad.wmnet,service=s3
- 10:33 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 10:27 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 10:26 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:21 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 10:20 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:20 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 10:19 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 10:01 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
- 09:58 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:57 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:52 brouberol@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:52 brouberol@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:51 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:51 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:47 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 09:47 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 09:46 brouberol@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:45 brouberol@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:45 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:44 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:05 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=aux-k8s-worker200[2-5].codfw.wmnet,cluster=aux-k8s,service=kubesvc
- 09:04 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=aux-k8s-worker200[6-9].codfw.wmnet,cluster=aux-k8s,service=kubesvc
- 09:04 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=aux-k8s-worker100[6-9].eqiad.wmnet,cluster=aux-k8s,service=kubesvc
- 08:55 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker200[6-9].eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 08:55 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-worker200[6-9].eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 08:35 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-worker1009.eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 08:35 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-worker1008.eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 08:34 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-worker1007.eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 08:34 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=aux-k8s-worker1006.eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 08:34 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1009.eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 08:34 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1008.eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 08:34 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1007.eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 08:34 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1006.eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 08:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c8b-codfw
- 08:29 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device fasw2-c8b-codfw
- 08:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c8a-codfw
- 08:29 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device fasw2-c8a-codfw
- 08:10 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.21 refs T420479
- 00:33 rzl@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 00:23 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 00:23 rzl@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 00:23 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 00:22 rzl@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 00:22 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 00:22 rzl@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 00:21 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 00:21 rzl@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 00:21 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
- 00:20 rzl@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
- 00:20 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
- 00:20 rzl@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
- 00:20 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
- 00:20 rzl@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
- 00:19 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
- 00:19 rzl@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
- 00:18 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 00:18 rzl@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 00:18 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 00:18 rzl@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 00:17 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 00:17 rzl@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 00:17 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 00:17 rzl@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 00:17 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 00:16 rzl@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
- 00:16 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
- 00:16 rzl@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
- 00:16 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
- 00:15 rzl@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
- 00:15 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 00:15 rzl@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 00:15 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1023.eqiad.wmnet with OS bookworm
- 00:15 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
- 00:15 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 00:15 rzl@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
- 00:15 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 00:14 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 00:14 rzl@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 00:13 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
- 00:13 rzl@deploy2002: helmfile [staging] START helmfile.d/services/page-analytics: apply
- 00:13 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 00:13 rzl@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 00:13 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 00:12 rzl@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 00:12 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 00:11 rzl@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 00:10 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
- 00:10 rzl@deploy2002: helmfile [staging] START helmfile.d/services/media-analytics: apply
- 00:09 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 00:07 rzl@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 00:07 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 00:06 rzl@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 00:06 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: apply
- 00:06 rzl@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: apply
- 00:06 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 00:06 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1022.eqiad.wmnet with OS bookworm
- 00:06 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 00:06 rzl@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 00:06 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
- 00:06 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 00:05 rzl@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
- 00:05 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
- 00:05 rzl@deploy2002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
- 00:05 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
- 00:05 rzl@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
- 00:04 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
- 00:04 rzl@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
- 00:04 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
- 00:04 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1021.eqiad.wmnet with OS bookworm
- 00:04 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 00:04 rzl@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
- 00:04 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 00:04 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 00:03 rzl@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 00:03 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 00:03 rzl@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 00:02 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
- 00:02 rzl@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
- 00:02 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
- 00:01 rzl@deploy2002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
- 00:01 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
- 00:01 rzl@deploy2002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
- 00:00 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 00:00 rzl@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
- 00:00 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
- 00:00 rzl@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
2026-03-24
- 23:59 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 23:59 rzl@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 23:59 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
- 23:59 rzl@deploy2002: helmfile [staging] START helmfile.d/services/data-gateway: apply
- 23:58 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 23:58 rzl@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 23:58 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 23:58 rzl@deploy2002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 23:56 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 23:56 rzl@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 23:56 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 23:56 rzl@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 23:54 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 23:53 rzl@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 23:53 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1023.eqiad.wmnet with reason: host reimage
- 23:53 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
- 23:52 rzl@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
- 23:46 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1022.eqiad.wmnet with reason: host reimage
- 23:43 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1021.eqiad.wmnet with reason: host reimage
- 23:19 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1023.eqiad.wmnet with OS bookworm
- 23:19 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1022.eqiad.wmnet with OS bookworm
- 23:18 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1021.eqiad.wmnet with OS bookworm
- 23:16 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-videoscaler: apply
- 23:16 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/mw-videoscaler: apply
- 23:15 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-videoscaler: apply
- 23:15 rzl@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-videoscaler: apply
- 22:03 jdlrobson@deploy2002: Finished scap sync-world: Backport for Drop inactive simple summary surveys (T389393) (duration: 08m 15s)
- 21:57 jdlrobson@deploy2002: jdlrobson: Continuing with sync
- 21:57 jdlrobson@deploy2002: jdlrobson: Backport for Drop inactive simple summary surveys (T389393) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:54 jdlrobson@deploy2002: Started scap sync-world: Backport for Drop inactive simple summary surveys (T389393)
- 21:52 jdlrobson@deploy2002: Finished scap sync-world: Backport for Address FIXME and drop not selector for section headings (T420085) (duration: 13m 11s)
- 21:47 jdlrobson@deploy2002: jdlrobson: Continuing with sync
- 21:44 jdlrobson@deploy2002: jdlrobson: Backport for Address FIXME and drop not selector for section headings (T420085) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:38 jdlrobson@deploy2002: Started scap sync-world: Backport for Address FIXME and drop not selector for section headings (T420085)
- 21:00 jforrester@deploy2002: mwscript-k8s job started: maintenance/namespaceDupes.php --wiki=frwiki --source-pseudo-namespace=Abstract_ --fix # T420654 abstract: is now an interwiki
- 20:55 jforrester@deploy2002: mwscript-k8s job started: moveBatch --wiki=frwiki '--u=Jdforrester (WMF)' --r=T420654 --noredirects /home/jforrester/T420654-frwiki-move # T420654 abstract: is now an interwiki; manual fix
- 20:55 jforrester@deploy2002: mwscript-k8s job started: moveBatch '--u=Jdforrester (WMF)' --r=T420654 --noredirects /home/jforrester/T420654-frwiki-move # T420654 abstract: is now an interwiki; manual fix
- 20:47 jforrester@deploy2002: mwscript-k8s job started: maintenance/namespaceDupes.php --wiki=ptwiki --fix # T420654 abstract: is now an interwiki
- 20:46 jforrester@deploy2002: mwscript-k8s job started: maintenance/namespaceDupes.php --wiki=idwiki --fix # T420654 abstract: is now an interwiki
- 20:46 jforrester@deploy2002: mwscript-k8s job started: maintenance/namespaceDupes.php --wiki=frwiki --fix # T420654 abstract: is now an interwiki
- 20:45 jforrester@deploy2002: mwscript-k8s job started: maintenance/namespaceDupes.php --wiki=eswiki --fix # T420654 abstract: is now an interwiki
- 20:39 jforrester@deploy2002: mwscript-k8s job started: maintenance/namespaceDupes.php --wiki=enwiki --fix # T420654 abstract: is now an interwiki
- 20:39 jforrester@deploy2002: mwscript-k8s job started: sql extensions/WikimediaMaintenance/maintenance/namespaceDupes.php --wiki=enwiki --fix # T420654 abstract: is now an interwiki
- 20:38 jforrester@deploy2002: mwscript-k8s job started: sql maintenance/namespaceDupes.php --wiki=enwiki --fix # T420654 abstract: is now an interwiki
- 20:38 jforrester@deploy2002: Finished scap sync-world: Backport for [wikifunctions] Drop m.wikifunctions.org from lists, we've not used it for years, Move GrowthExperiments REST API definition to IS, dumpInterwiki: Re-generate to add Abstract Wikipedia (and others) (T420654) (duration: 07m 46s)
- 20:33 jforrester@deploy2002: jforrester: Continuing with sync
- 20:32 jforrester@deploy2002: jforrester: Backport for [wikifunctions] Drop m.wikifunctions.org from lists, we've not used it for years, Move GrowthExperiments REST API definition to IS, dumpInterwiki: Re-generate to add Abstract Wikipedia (and others) (T420654) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified the
- 20:30 jforrester@deploy2002: Started scap sync-world: Backport for [wikifunctions] Drop m.wikifunctions.org from lists, we've not used it for years, Move GrowthExperiments REST API definition to IS, dumpInterwiki: Re-generate to add Abstract Wikipedia (and others) (T420654)
- {{safesubst:SAL entry|1=20:27 jforrester@deploy2002: Finished scap sync-world: Backport for Set json object before setting Abstract Wiki Id (T420916), AbstractPreview: apply selected preview language lang/dir to abstract preview body (T420687), AbstractTitle: Handle pageinfo responses without normalized titles (T420725), [[gerrit:1259992|[abstractwiki] Don't list abstract as a langlist entry}}
- 20:22 jforrester@deploy2002: jforrester: Continuing with sync
- 20:22 jforrester@deploy2002: jforrester: Backport for Set json object before setting Abstract Wiki Id (T420916), AbstractPreview: apply selected preview language lang/dir to abstract preview body (T420687), AbstractTitle: Handle pageinfo responses without normalized titles (T420725), [abstractwiki] Don't list abstract as a langlist entry (T420654) s
- {{safesubst:SAL entry|1=20:20 jforrester@deploy2002: Started scap sync-world: Backport for Set json object before setting Abstract Wiki Id (T420916), AbstractPreview: apply selected preview language lang/dir to abstract preview body (T420687), AbstractTitle: Handle pageinfo responses without normalized titles (T420725), [[gerrit:1259992|[abstractwiki] Don't list abstract as a langlist entry}}
- 20:12 jforrester@deploy2002: Finished scap sync-world: Backport for Generate our own logo thumbnails rather than using MediaWiki's (T414048), Enwikinews: Only enable flaggedRevs in article namespace (T418066), Disable magic links on afwiki (T420142) (duration: 09m 22s)
- 20:08 jforrester@deploy2002: jforrester, pppery: Continuing with sync
- 20:05 jforrester@deploy2002: jforrester, pppery: Backport for Generate our own logo thumbnails rather than using MediaWiki's (T414048), Enwikinews: Only enable flaggedRevs in article namespace (T418066), Disable magic links on afwiki (T420142) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:03 jforrester@deploy2002: Started scap sync-world: Backport for Generate our own logo thumbnails rather than using MediaWiki's (T414048), Enwikinews: Only enable flaggedRevs in article namespace (T418066), Disable magic links on afwiki (T420142)
- 19:42 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 19:42 jasmine@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 19:41 jasmine@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 19:39 reedy@deploy2002: Finished scap sync-world: Backport for tests: Make many things static for PHPUnit 10 (T420844), phpunit.xml: Update configuration for PHPUnit 10 (T420844) (duration: 07m 21s)
- 19:35 jasmine@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 19:35 reedy@deploy2002: reedy: Continuing with sync
- 19:34 reedy@deploy2002: reedy: Backport for tests: Make many things static for PHPUnit 10 (T420844), phpunit.xml: Update configuration for PHPUnit 10 (T420844) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:32 reedy@deploy2002: Started scap sync-world: Backport for tests: Make many things static for PHPUnit 10 (T420844), phpunit.xml: Update configuration for PHPUnit 10 (T420844)
- 19:02 inflatador: bking@apt1002 `sudo -E reprepro -C component/opensearch2 include trixie-wikimedia ~/wmf-opensearch-search-plugins-2.19.5+3-trixie/wmf-opensearch-search-plugins_2.19.5+3_amd64.changes`
- 18:48 jasmine@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 18:48 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1170: Degraded drive replaced T420873
- 18:43 jasmine@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 18:36 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 18:35 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 18:25 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 18:24 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 18:20 akhatun@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 18:20 akhatun@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 18:13 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 18:11 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 18:07 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on phab2002.codfw.wmnet with reason: T420228
- 18:01 aokoth@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 18:01 aokoth@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 18:01 aokoth@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 18:00 aokoth@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 18:00 mutante: codesearch9.codesearch - systemctl restart hound_proxy (T421147)
- 17:34 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs7003.magru.wmnet} and A:liberica
- 17:30 brett@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs7003.magru.wmnet} and A:liberica
- 17:20 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:20 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:20 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:20 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:00 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:00 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:00 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:00 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 16:38 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1113.*
- 16:32 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1170: Degraded drive replaced T420873
- 16:24 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 16:24 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 16:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1113.eqiad.wmnet with OS trixie
- 16:05 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:04 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:03 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 16:03 brouberol@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 16:03 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 16:03 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 16:03 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 16:03 brouberol@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 15:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
- 15:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1113.eqiad.wmnet with reason: host reimage
- 15:54 bjensen: Services portion of the datacenter switchover is complete
- 15:51 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2009.codfw.wmnet with OS trixie
- 15:46 blake@cumin1003: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in codfw: Datacenter Switchover - T413974
- 15:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 15:38 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS trixie
- 15:38 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1113.eqiad.wmnet with OS trixie
- 15:36 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 15:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2009.codfw.wmnet with reason: host reimage
- 15:30 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2009.codfw.wmnet with reason: host reimage
- 15:20 blake@cumin1003: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Switchover - T413974
- 15:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1113.eqiad.wmnet with OS trixie
- 15:18 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2009.codfw.wmnet with OS trixie
- 15:16 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2008.codfw.wmnet with OS trixie
- 14:59 blake@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool codfw [reason: no reason specified, no task ID specified]
- 14:59 blake@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool codfw [reason: no reason specified, no task ID specified]
- 14:59 bjensen: beginning the Traffic and Services portions of the DC switchover, operational followup will be in #wikimedia-sre
- 14:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2008.codfw.wmnet with reason: host reimage
- 14:56 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2008.codfw.wmnet with reason: host reimage
- 14:51 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 14:50 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1009.eqiad.wmnet with OS trixie
- 14:44 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2008.codfw.wmnet with OS trixie
- 14:42 aokoth@dns1004: END - running authdns-update
- 14:41 aokoth@dns1004: START - running authdns-update
- 14:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1009.eqiad.wmnet with reason: host reimage
- 14:31 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:27 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1009.eqiad.wmnet with reason: host reimage
- 14:26 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:25 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 14:23 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 14:20 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:19 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 14:19 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 14:16 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1009.eqiad.wmnet with OS trixie
- 14:15 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2008.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:14 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 14:13 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 14:13 cmooney@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 14:13 cmooney@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 14:12 dcausse@deploy2002: Finished scap sync-world: Backport for Revert "search: use the discovery ns record for the semanticsearch cluster" (duration: 06m 54s)
- 14:08 dcausse@deploy2002: dcausse: Continuing with sync
- 14:07 dcausse@deploy2002: dcausse: Backport for Revert "search: use the discovery ns record for the semanticsearch cluster" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:07 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:05 dcausse@deploy2002: Started scap sync-world: Backport for Revert "search: use the discovery ns record for the semanticsearch cluster"
- 14:04 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1008.eqiad.wmnet with OS trixie
- 14:01 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2007.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:59 jforrester@deploy2002: mwscript-k8s job started: sql --wiki=abstractwiki /srv/mediawiki/php-1.46.0-wmf.20/extensions/Translate/sql/mysql/translate_message_group_subscriptions.sql # T420656 translate_message_group_subscriptions
- 13:59 dcausse@deploy2002: Sync cancelled.
- 13:57 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 13:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1008.eqiad.wmnet with reason: host reimage
- 13:46 dcausse@deploy2002: dcausse: Backport for search: use the discovery ns record for the semanticsearch cluster (T414484) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:44 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1008.eqiad.wmnet with reason: host reimage
- 13:44 dcausse@deploy2002: Started scap sync-world: Backport for search: use the discovery ns record for the semanticsearch cluster (T414484)
- 13:33 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1008.eqiad.wmnet with OS trixie
- 13:32 sukhe: sudo cumin -b1 -s20 'C:bird' "run-puppet-agent --enable 'merging CR 1248385, T413740'"
- 13:30 cmelo@deploy2002: Finished scap sync-world: Backport for Enable the CampaignEvents extension on all wikibooks (T419597), Enable $wgCampaignEventsEnableEventGoals in prod wikis (T414149) (duration: 12m 43s)
- 13:26 cmelo@deploy2002: cmelo, daimona: Continuing with sync
- 13:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1007.eqiad.wmnet with OS trixie
- 13:23 sukhe: sudo cumin 'C:bird' "disable-puppet 'merging CR 1248385, T413740'"
- 13:20 cmelo@deploy2002: cmelo, daimona: Backport for Enable the CampaignEvents extension on all wikibooks (T419597), Enable $wgCampaignEventsEnableEventGoals in prod wikis (T414149) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:18 cmelo@deploy2002: Started scap sync-world: Backport for Enable the CampaignEvents extension on all wikibooks (T419597), Enable $wgCampaignEventsEnableEventGoals in prod wikis (T414149)
- 13:08 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1007.eqiad.wmnet with reason: host reimage
- 13:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) payments1012.frack.eqiad.wmnet on all recursors
- 13:04 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache payments1012.frack.eqiad.wmnet on all recursors
- 13:04 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) payments1011.frack.eqiad.wmnet on all recursors
- 13:03 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache payments1011.frack.eqiad.wmnet on all recursors
- 13:03 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) payments1010.frack.eqiad.wmnet on all recursors
- 13:03 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache payments1010.frack.eqiad.wmnet on all recursors
- 13:02 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1007.eqiad.wmnet with reason: host reimage
- 13:00 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:00 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify records for payments servers frack - cmooney@cumin1003"
- 13:00 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: modify records for payments servers frack - cmooney@cumin1003"
- 12:56 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 12:50 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS trixie
- 12:02 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1017.eqiad.wmnet
- 12:02 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1017.eqiad.wmnet
- 12:01 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
- 11:53 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:53 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:51 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1017.eqiad.wmnet with reason: Rebooting clouddb1017 T419960
- 11:51 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1017.eqiad.wmnet
- 11:51 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1017.eqiad.wmnet
- 11:51 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
- 11:49 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:49 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:48 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:48 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:47 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:47 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:47 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:47 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:46 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:46 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1006.eqiad.wmnet with OS trixie
- 11:36 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
- 11:32 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=x3
- 11:32 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=x3
- 11:32 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
- 11:31 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1023.eqiad.wmnet,service=x3
- 11:31 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1022.eqiad.wmnet,service=x3
- 11:31 fnegri@cumin1003: conftool action : set/weight=100; selector: name=clouddb1023.eqiad.wmnet,service=x3
- 11:31 fnegri@cumin1003: conftool action : set/weight=100; selector: name=clouddb1022.eqiad.wmnet,service=x3
- 11:27 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
- 11:27 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1023.eqiad.wmnet,service=s3
- 11:27 fnegri@cumin1003: conftool action : set/weight=100; selector: name=clouddb1023.eqiad.wmnet,service=s3
- 11:26 fnegri@cumin1003: conftool action : set/weight=100; selector: name=clouddb1022.eqiad.wmnet,service=s3
- 11:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1006.eqiad.wmnet with reason: host reimage
- 11:22 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1022.eqiad.wmnet,service=s3
- 11:19 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1022.eqiad.wmnet,service=s3
- 11:19 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1023.eqiad.wmnet,service=s3
- 11:18 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1006.eqiad.wmnet with reason: host reimage
- 11:18 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
- 11:17 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
- 11:14 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1023.eqiad.wmnet,service=s3
- 11:14 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1022.eqiad.wmnet,service=s3
- 11:07 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1006.eqiad.wmnet with OS trixie
- 10:55 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:55 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:53 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2007.codfw.wmnet with OS trixie
- 10:49 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2006.codfw.wmnet with OS trixie
- 10:42 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:42 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:36 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2007.codfw.wmnet with reason: host reimage
- 10:33 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2006.codfw.wmnet with reason: host reimage
- 10:30 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:29 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:29 brouberol@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2007.codfw.wmnet with reason: host reimage
- 10:28 brouberol@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2006.codfw.wmnet with reason: host reimage
- 10:22 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:21 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:20 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:20 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:18 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:18 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:18 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:18 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:17 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1006.eqiad.wmnet with OS trixie
- 10:17 brouberol@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2007.codfw.wmnet with OS trixie
- 10:16 brouberol@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2006.codfw.wmnet with OS trixie
- 10:07 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1006.eqiad.wmnet with OS trixie
- 09:47 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 09:46 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 09:43 ayounsi@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01
- 09:34 ayounsi@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01
- 09:31 ayounsi@cumin1003: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01
- 09:31 ayounsi@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01
- 09:29 ayounsi@cumin1003: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01
- 09:29 ayounsi@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo02 and group 01
- 09:23 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bookworm
- 09:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
- 09:01 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
- 08:52 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 08:52 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 08:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old ulsfo ganeti VIP - ayounsi@cumin1003"
- 08:50 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old ulsfo ganeti VIP - ayounsi@cumin1003"
- 08:49 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 08:46 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 08:45 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1170: Degraded drive T420873
- 08:45 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 08:45 fceratto@cumin1003: START - Cookbook sre.mysql.depool depool db1170: Degraded drive T420873
- 08:43 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 08:39 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti4008.ulsfo.wmnet with OS bookworm
- 08:31 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 08:31 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 08:29 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 08:27 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 08:27 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 08:25 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 08:24 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 08:13 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.21 refs T420479
- 07:59 hashar: Changed https://logstash.wikimedia.org/ default page back to /app/dashboards
- 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.46.0-wmf.18 (duration: 01m 13s)
- 03:42 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.46.0-wmf.21 refs T420479 (duration: 39m 27s)
- 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.46.0-wmf.21 refs T420479
- 02:46 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 02:08 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 04s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:56 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 01:40 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1104.*
- 01:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1104.eqiad.wmnet with OS trixie
- 01:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
- 01:08 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1104.eqiad.wmnet with reason: host reimage
- 00:52 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS trixie
- 00:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 00:18 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1115.eqiad.wmnet with OS trixie
2026-03-23
- 22:51 rzl: root@apt1002:~# reprepro --noskipold --restrict vopsbot update bookworm-wikimedia
- 22:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 22:28 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host an-worker1172.eqiad.wmnet
- 22:25 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1104.eqiad.wmnet with OS trixie
- 22:07 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie
- 22:05 maryum: Deployed security fix for T415584
- 21:53 maryum: Deployed security fix for T419192
- 21:41 maryum: Deployed security fix for T419168
- 21:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 21:25 catrope@deploy2002: Finished scap sync-world: Backport for testwiki: Add temporary groups for security testing (duration: 12m 33s)
- 21:22 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie
- 21:21 catrope@deploy2002: catrope: Continuing with sync
- 21:18 catrope@deploy2002: catrope: Backport for testwiki: Add temporary groups for security testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:12 catrope@deploy2002: Started scap sync-world: Backport for testwiki: Add temporary groups for security testing
- 21:05 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1106.eqiad.wmnet [reason: trixie reimaging]
- 21:05 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1106.eqiad.wmnet [reason: trixie reimaging]
- 21:05 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1104.eqiad.wmnet with OS trixie
- 21:04 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1104.eqiad.wmnet [reason: trixie reimaging]
- 21:03 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1103.eqiad.wmnet [reason: trixie reimaging]
- 20:58 jforrester@deploy2002: Finished scap sync-world: Backport for Abstract Wikipedia: Fix API call to get page info (T420725), [abstractwiki] Enable the Translate extension (T420656), Move testwiki-only Attribution REST API definition to IS (duration: 11m 12s)
- 20:54 jforrester@deploy2002: jforrester: Continuing with sync
- 20:53 jforrester@deploy2002: jforrester: Backport for Abstract Wikipedia: Fix API call to get page info (T420725), [abstractwiki] Enable the Translate extension (T420656), Move testwiki-only Attribution REST API definition to IS synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:51 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1103.eqiad.wmnet with OS trixie
- 20:50 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy4002.wikimedia.org
- 20:50 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:50 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy4002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
- 20:50 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy4002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
- 20:47 jforrester@deploy2002: Started scap sync-world: Backport for Abstract Wikipedia: Fix API call to get page info (T420725), [abstractwiki] Enable the Translate extension (T420656), Move testwiki-only Attribution REST API definition to IS
- 20:46 sukhe@cumin1003: START - Cookbook sre.dns.netbox
- 20:45 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 20:43 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1102.eqiad.wmnet [reason: trixie reimaging]
- {{safesubst:SAL entry|1=20:42 dani@deploy2002: Finished scap sync-world: Backport for Undeploy participant recruitment survey on ptwiki (T419275), Undeploy participant recruitment survey on trwiki (T419275), Undeploy participant recruitment survey on frwiki (T419778), testKitchen: Add custom stream name (T417050), [[gerrit:1259120|Enable wgCampaignEventsEnableEventGoals in}}
- 20:42 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1102.eqiad.wmnet with OS trixie
- 20:41 sukhe@cumin1003: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy4002.wikimedia.org
- 20:40 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts hcaptcha-proxy4001.wikimedia.org
- 20:40 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:40 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy4001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
- 20:39 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: hcaptcha-proxy4001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
- 20:37 dani@deploy2002: milimetric, daimona, dani: Continuing with sync
- {{safesubst:SAL entry|1=20:36 dani@deploy2002: milimetric, daimona, dani: Backport for Undeploy participant recruitment survey on ptwiki (T419275), Undeploy participant recruitment survey on trwiki (T419275), Undeploy participant recruitment survey on frwiki (T419778), testKitchen: Add custom stream name (T417050), [[gerrit:1259120|Enable wgCampaignEventsEnableEventGoals i}}
- 20:35 sukhe@cumin1003: START - Cookbook sre.dns.netbox
- {{safesubst:SAL entry|1=20:34 dani@deploy2002: Started scap sync-world: Backport for Undeploy participant recruitment survey on ptwiki (T419275), Undeploy participant recruitment survey on trwiki (T419275), Undeploy participant recruitment survey on frwiki (T419778), testKitchen: Add custom stream name (T417050), [[gerrit:1259120|Enable wgCampaignEventsEnableEventGoals in}}
- 20:31 sukhe@cumin1003: START - Cookbook sre.hosts.decommission for hosts hcaptcha-proxy4001.wikimedia.org
- 20:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
- 20:24 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1103.eqiad.wmnet with reason: host reimage
- 20:23 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie
- 20:19 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
- 20:17 alexsanford@deploy2002: Finished scap sync-world: Backport for Reduce reauth timeout for editing site JS to 10 minutes (T419605) (duration: 07m 32s)
- 20:14 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1102.eqiad.wmnet with reason: host reimage
- 20:13 alexsanford@deploy2002: alexsanford: Continuing with sync
- 20:11 alexsanford@deploy2002: alexsanford: Backport for Reduce reauth timeout for editing site JS to 10 minutes (T419605) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:09 alexsanford@deploy2002: Started scap sync-world: Backport for Reduce reauth timeout for editing site JS to 10 minutes (T419605)
- 20:08 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS trixie
- 20:07 alexsanford: Deployed mitigation for T419605
- 19:58 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 19:58 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 19:58 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 19:58 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS trixie
- 19:57 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 19:54 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 19:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 19:54 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 19:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host hcaptcha-proxy4004.wikimedia.org
- 19:51 cdobbins@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1102.eqiad.wmnet with OS trixie
- 19:50 cdobbins@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1103.eqiad.wmnet with OS trixie
- 19:50 ayounsi@cumin1003: START - Cookbook sre.hosts.reboot-single for host hcaptcha-proxy4004.wikimedia.org
- 19:47 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie
- 19:47 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host hcaptcha-proxy4003.wikimedia.org
- 19:46 ayounsi@cumin1003: START - Cookbook sre.hosts.reboot-single for host hcaptcha-proxy4003.wikimedia.org
- 19:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 19:44 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on P{aqs[1011,1014,1016-1022]*} and P{P:Cassandra}
- 19:42 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1103.eqiad.wmnet with OS trixie
- 19:42 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1103.eqiad.wmnet [reason: trixie reimaging]
- 19:41 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1101.eqiad.wmnet [reason: trixie reimaging]
- 19:41 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1102.eqiad.wmnet with OS trixie
- 19:41 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1102.eqiad.wmnet [reason: trixie reimaging]
- 19:40 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1101.eqiad.wmnet with OS trixie
- 19:39 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1100.eqiad.wmnet [reason: trixie reimaging]
- 19:37 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp1100.eqiad.wmnet with OS trixie
- 19:30 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie
- 19:18 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
- 19:14 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
- 19:13 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1101.eqiad.wmnet with reason: host reimage
- 19:13 akhatun@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 19:13 akhatun@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 19:10 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp1100.eqiad.wmnet with reason: host reimage
- 18:59 inflatador: bking@deploy2002 restarting opensearch-semantic-search eqiad to renew certs
- 18:57 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS trixie
- 18:55 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1101.eqiad.wmnet with OS trixie
- 18:54 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS trixie
- 18:53 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1100.eqiad.wmnet with OS trixie
- 18:50 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 18:49 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie
- 18:36 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on hcaptcha-proxy4002.wikimedia.org with reason: depooled host (soon to be decomed)
- 18:35 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on hcaptcha-proxy4001.wikimedia.org with reason: depooled host (soon to be decomed)
- 18:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 18:10 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp1115.eqiad.wmnet with OS trixie
- 18:05 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on P{aqs[1011,1014,1016-1022]*} and P{P:Cassandra}
- 17:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 17:54 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp1115.eqiad.wmnet with OS trixie
- 17:53 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-eqiad
- 17:49 dreamyjazz@deploy2002: Finished scap sync-world: Backport for EventStreamConfig: Document not adding performer attributes to SI interaction v2 stream (T418740) (duration: 06m 28s)
- 17:45 dreamyjazz@deploy2002: kharlan, dreamyjazz: Continuing with sync
- 17:45 dreamyjazz@deploy2002: kharlan, dreamyjazz: Backport for EventStreamConfig: Document not adding performer attributes to SI interaction v2 stream (T418740) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:43 dreamyjazz@deploy2002: Started scap sync-world: Backport for EventStreamConfig: Document not adding performer attributes to SI interaction v2 stream (T418740)
- 17:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:34 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1101.eqiad.wmnet with OS trixie
- 17:34 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1101.eqiad.wmnet [reason: trixie reimaging]
- 17:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:31 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp1100.eqiad.wmnet with OS trixie
- 17:30 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1100.eqiad.wmnet [reason: trixie reimaging]
- 17:27 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:26 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:24 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:21 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 17:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:20 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp1115.eqiad.wmnet with OS trixie
- 17:20 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:17 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:16 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1009.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 17:13 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 17:13 bd808@deploy2002: Finished deploy [releng/jenkins-deploy@f47af21] (releasing): jobs: Use TZ=UTC in branchMWSingleVersion.groovy trigger (T404399) (duration: 01m 36s)
- 17:12 bd808@deploy2002: Started deploy [releng/jenkins-deploy@f47af21] (releasing): jobs: Use TZ=UTC in branchMWSingleVersion.groovy trigger (T404399)
- 17:12 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 17:09 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 17:08 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 17:08 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 17:08 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 17:07 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 17:06 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 17:04 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 17:04 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 17:03 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 17:02 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 17:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet
- 16:56 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:56 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 14 hosts
- 16:55 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:55 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 14 hosts
- 16:55 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
- 16:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 16:52 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 16:52 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 16:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:46 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:41 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:38 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
- 16:35 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 16:34 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:34 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:34 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
- 16:32 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 16:31 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 16:30 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 16:29 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1023.eqiad.wmnet
- 16:29 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1023.eqiad.wmnet
- 16:28 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 16:27 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 16:24 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1010.eqiad.wmnet
- 16:24 eevans@cumin1003: START - Cookbook sre.hosts.remove-downtime for aqs1010.eqiad.wmnet
- 16:21 jgreen@dns1004: END - running authdns-update
- 16:19 jgreen@dns1004: START - running authdns-update
- 16:18 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 16:17 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 16:11 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1023.eqiad.wmnet with reason: Rebooting clouddb1023 T419960
- 16:09 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1025.eqiad.wmnet
- 16:09 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1025.eqiad.wmnet
- 16:09 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1025.eqiad.wmnet
- 16:04 urandom: stopping aqs1010 for SSD replacement — T420867
- 16:03 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 16:03 eevans@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on aqs1010.eqiad.wmnet with reason: Shutting down for SSD replacement — T420867
- 15:58 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1025.eqiad.wmnet
- 15:57 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1025.eqiad.wmnet with reason: Rebooting clouddb1025 T419960
- 15:57 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:56 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:53 topranks: disabling puppet for nftables-enabled machines to validate new ruleset on selected hosts before wider rollout T420715
- 15:50 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:50 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:49 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 15:44 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:31 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1172.eqiad.wmnet
- 15:21 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 15:20 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 15:15 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1020.eqiad.wmnet
- 15:14 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1020.eqiad.wmnet
- 15:14 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet
- 15:05 btullis@cumin1003: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1172.eqiad.wmnet
- 15:03 btullis@cumin1003: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1172.eqiad.wmnet
- 15:03 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) opensearch-ipoid.discovery.wmnet on all recursors
- 15:03 bking@cumin2002: START - Cookbook sre.dns.wipe-cache opensearch-ipoid.discovery.wmnet on all recursors
- 15:03 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-dse-aa,name=eqiad
- 15:01 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 15:01 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 14:59 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) opensearch-ipoid.discovery.wmnet on all recursors
- 14:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache opensearch-ipoid.discovery.wmnet on all recursors
- 14:58 sukhe@dns1004: END - running authdns-update
- 14:58 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) opensearch-test.discovery.wmnet on all recursors
- 14:58 bking@cumin2002: START - Cookbook sre.dns.wipe-cache opensearch-test.discovery.wmnet on all recursors
- 14:57 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet
- 14:57 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:56 sukhe@dns1004: START - running authdns-update
- 14:56 sukhe@dns1004: END - running authdns-update
- 14:56 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1020.eqiad.wmnet with reason: Rebooting clouddb1020 T419960
- 14:56 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1019.eqiad.wmnet
- 14:56 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1019.eqiad.wmnet
- 14:55 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:55 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet
- 14:55 sukhe@dns1004: START - running authdns-update
- 14:55 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-eqiad
- 14:52 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:52 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:51 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:51 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:49 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:49 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:49 sukhe@dns1004: END - running authdns-update
- 14:48 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:48 sukhe@dns1004: START - running authdns-update
- 14:47 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:46 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.eqiad.wmnet
- 14:45 bking@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=k8s-ingress-dse-aa,name=eqiad
- 14:44 sukhe@dns1004: END - running authdns-update
- 14:43 sukhe@dns1004: START - running authdns-update
- 14:40 sukhe@dns1004: FAIL - running authdns-update
- 14:39 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.eqiad.wmnet
- 14:38 sukhe@dns1004: START - running authdns-update
- 14:37 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-dse-aa,name=eqiad
- 14:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) k8s-ingress-dse-aa.discovery.wmnet on all recursors
- 14:36 bking@cumin2002: START - Cookbook sre.dns.wipe-cache k8s-ingress-dse-aa.discovery.wmnet on all recursors
- 14:34 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet
- 14:34 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1019.eqiad.wmnet with reason: Rebooting clouddb1019 T419960
- 14:33 sukhe@dns1004: FAIL - running authdns-update
- 14:33 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1002.eqiad.wmnet
- 14:33 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1018.eqiad.wmnet
- 14:33 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1018.eqiad.wmnet
- 14:32 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet
- 14:32 sukhe@dns1004: START - running authdns-update
- 14:31 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=k8s-ingress-dse-aa,name=codfw
- 14:30 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1172.eqiad.wmnet with OS bullseye
- 14:30 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:27 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1002.eqiad.wmnet
- 14:22 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1018.eqiad.wmnet with reason: Rebooting clouddb1018 T419960
- 14:22 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1018.eqiad.wmnet
- 14:22 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1018.eqiad.wmnet
- 14:21 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet
- 14:20 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:17 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1001.eqiad.wmnet
- 14:14 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{wikikube-worker[2332-2356].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 14:14 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2352-2356].codfw.wmnet
- 14:14 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2352-2356].codfw.wmnet
- 14:13 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on db1253.eqiad.wmnet with reason: Under repair
- 14:11 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1001.eqiad.wmnet
- 14:07 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2352-2356].codfw.wmnet
- 14:04 kamila@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host hcaptcha2002.wikimedia.org
- 14:04 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2352-2356].codfw.wmnet
- 14:03 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2347-2351].codfw.wmnet
- 14:03 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2347-2351].codfw.wmnet
- 14:00 kamila@cumin1003: START - Cookbook sre.hosts.reboot-single for host hcaptcha2002.wikimedia.org
- 14:00 kamila@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host hcaptcha2001.wikimedia.org
- 13:59 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1172.eqiad.wmnet with reason: host reimage
- 13:57 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2347-2351].codfw.wmnet
- 13:56 kamila@cumin1003: START - Cookbook sre.hosts.reboot-single for host hcaptcha2001.wikimedia.org
- 13:56 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 13:55 kamila@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host hcaptcha1002.wikimedia.org
- 13:55 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1172.eqiad.wmnet with reason: host reimage
- 13:52 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2347-2351].codfw.wmnet
- 13:52 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2342-2346].codfw.wmnet
- 13:52 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2342-2346].codfw.wmnet
- 13:51 kamila@cumin1003: START - Cookbook sre.hosts.reboot-single for host hcaptcha1002.wikimedia.org
- 13:51 kamila@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host hcaptcha1001.wikimedia.org
- 13:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:48 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:47 kamila@cumin1003: START - Cookbook sre.hosts.reboot-single for host hcaptcha1001.wikimedia.org
- 13:47 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2342-2346].codfw.wmnet
- 13:43 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2342-2346].codfw.wmnet
- 13:43 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2337-2341].codfw.wmnet
- 13:43 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2337-2341].codfw.wmnet
- 13:42 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1172.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:42 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
- 13:41 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2012.codfw.wmnet
- 13:41 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2011.codfw.wmnet
- 13:38 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
- 13:36 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host rdb2012.codfw.wmnet
- 13:36 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host rdb2011.codfw.wmnet
- 13:36 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2337-2341].codfw.wmnet
- 13:31 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2337-2341].codfw.wmnet
- 13:30 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2332-2336].codfw.wmnet
- 13:30 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2332-2336].codfw.wmnet
- 13:30 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2003.codfw.wmnet
- 13:30 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
- 13:29 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudlb1002.eqiad.wmnet
- 13:29 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
- 13:28 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
- 13:25 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2332-2336].codfw.wmnet
- 13:24 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host deploy2003.codfw.wmnet
- 13:21 jforrester@deploy2002: mwscript-k8s job started: extensions/WikimediaMaintenance/maintenance/createExtensionTables.php --wiki=abstractwiki translate # T420656
- 13:21 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2332-2336].codfw.wmnet
- 13:21 jayme@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[2332-2356].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 13:20 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
- 13:20 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudlb1001.eqiad.wmnet
- 13:20 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
- 13:19 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:19 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudlb1001.eqiad.wmnet
- 13:18 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
- 13:17 sgimeno@deploy2002: Finished scap sync-world: Backport for fix(WelcomeSurveyHooks): ensure accountJustCreated is always added (T420722), tests: add coverage for WelcomeSurveyHooks::onCentralAuthPostLoginRedirect (T420722), fix(WelcomeSurveyHooks): ensure accountJustCreated is always added 2 (T420722) (duration: 11m 43s)
- 13:16 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:14 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:13 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:11 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:11 sgimeno@deploy2002: sgimeno: Continuing with sync
- 13:08 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{wikikube-worker[2005-2006,2011-2018,2033-2039,2041-2042,2044,2046,2049-2051,2055-2062,2064-2065,2067-2078,2087-2095,2102-2115,2124-2179,2184-2199].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 13:08 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2186-2199].codfw.wmnet
- 13:08 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2186-2199].codfw.wmnet
- 13:07 sgimeno@deploy2002: sgimeno: Backport for fix(WelcomeSurveyHooks): ensure accountJustCreated is always added (T420722), tests: add coverage for WelcomeSurveyHooks::onCentralAuthPostLoginRedirect (T420722), fix(WelcomeSurveyHooks): ensure accountJustCreated is always added 2 (T420722) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Ch
- 13:05 sgimeno@deploy2002: Started scap sync-world: Backport for fix(WelcomeSurveyHooks): ensure accountJustCreated is always added (T420722), tests: add coverage for WelcomeSurveyHooks::onCentralAuthPostLoginRedirect (T420722), fix(WelcomeSurveyHooks): ensure accountJustCreated is always added 2 (T420722)
- 12:43 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "bast4006 - ayounsi@cumin1003"
- 12:42 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "bast4006 - ayounsi@cumin1003"
- 12:42 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2006.codfw.wmnet with OS bookworm
- 12:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast4006.wikimedia.org
- 12:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast4006.wikimedia.org with OS bookworm
- 12:34 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 12:28 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
- 12:22 cmooney@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2006.codfw.wmnet with reason: host reimage
- 12:18 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast4006.wikimedia.org with reason: host reimage
- 12:16 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2186-2199].codfw.wmnet
- 12:14 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on bast4006.wikimedia.org with reason: host reimage
- 12:08 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host testvm2006.codfw.wmnet with OS bookworm
- 12:07 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2186-2199].codfw.wmnet
- 12:04 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2168-2179,2184-2185].codfw.wmnet
- 12:04 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2168-2179,2184-2185].codfw.wmnet
- 11:52 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2168-2179,2184-2185].codfw.wmnet
- 11:44 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2168-2179,2184-2185].codfw.wmnet
- 11:40 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2154-2167].codfw.wmnet
- 11:40 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2154-2167].codfw.wmnet
- 11:31 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2154-2167].codfw.wmnet
- 11:23 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2154-2167].codfw.wmnet
- 11:20 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2140-2153].codfw.wmnet
- 11:20 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host bast4006.wikimedia.org with OS bookworm
- 11:20 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2140-2153].codfw.wmnet
- 11:19 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast4006.wikimedia.org - ayounsi@cumin1003"
- 11:19 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast4006.wikimedia.org - ayounsi@cumin1003"
- 11:19 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast4006.wikimedia.org on all recursors
- 11:19 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache bast4006.wikimedia.org on all recursors
- 11:19 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:19 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4006.wikimedia.org - ayounsi@cumin1003"
- 11:19 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast4006.wikimedia.org - ayounsi@cumin1003"
- 11:15 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 11:15 ayounsi@cumin1003: START - Cookbook sre.ganeti.makevm for new host bast4006.wikimedia.org
- 11:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install4003.wikimedia.org
- 11:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1003"
- 11:08 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install4003.wikimedia.org decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1003"
- 11:08 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2140-2153].codfw.wmnet
- 11:05 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 11:00 ayounsi@cumin1003: START - Cookbook sre.hosts.decommission for hosts install4003.wikimedia.org
- 10:57 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 10:55 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2140-2153].codfw.wmnet
- 10:55 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2126-2139].codfw.wmnet
- 10:55 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2126-2139].codfw.wmnet
- 10:44 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2126-2139].codfw.wmnet
- 10:43 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:43 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:38 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 10:38 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:38 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:38 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 10:31 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2126-2139].codfw.wmnet
- 10:30 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2104-2115,2124-2125].codfw.wmnet
- 10:30 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2104-2115,2124-2125].codfw.wmnet
- 10:28 topranks: disable puppet on routed-ganeti hosts to test nftables update on specific nodes T420715
- 10:27 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s1
- 10:25 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s1
- 10:25 ayounsi@dns1004: END - running authdns-update
- 10:24 ayounsi@dns1004: START - running authdns-update
- 10:23 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-worker1172.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 10:22 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2104-2115,2124-2125].codfw.wmnet
- 10:20 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s4
- 10:18 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s4
- 10:13 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s8
- 10:11 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s8
- 10:09 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1172.eqiad.wmnet with OS bullseye
- 10:08 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2104-2115,2124-2125].codfw.wmnet
- 10:05 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s7
- 10:05 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2076-2078,2087-2095,2102-2103].codfw.wmnet
- 10:04 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2076-2078,2087-2095,2102-2103].codfw.wmnet
- 10:04 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s7
- 09:58 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s3
- 09:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:57 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s3
- 09:53 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:53 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2076-2078,2087-2095,2102-2103].codfw.wmnet
- 09:52 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s2
- 09:49 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s2
- 09:49 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 09:49 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 09:49 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 09:49 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 09:44 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2076-2078,2087-2095,2102-2103].codfw.wmnet
- 09:44 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s5
- 09:44 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2060-2062,2064-2065,2067-2075].codfw.wmnet
- 09:44 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2060-2062,2064-2065,2067-2075].codfw.wmnet
- 09:42 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s5
- 09:40 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
- 09:39 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1172.eqiad.wmnet with OS bullseye
- 09:33 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s6
- 09:32 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s6
- 09:29 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
- 09:29 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1172.eqiad.wmnet with OS bullseye
- 09:25 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 09:25 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 09:24 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section es7
- 09:23 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section es7
- 09:22 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 09:22 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 09:17 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section es6
- 09:16 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section es6
- 09:11 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:11 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:10 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section x3
- 09:09 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section x3
- 09:08 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2060-2062,2064-2065,2067-2075].codfw.wmnet
- 09:05 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:05 elukey@cumin1003: START - Cookbook sre.hosts.provision for host aux-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 09:02 fceratto@cumin1003: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section x1
- 09:01 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section x1
- 09:00 federico3: starting T416706
- 09:00 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2060-2062,2064-2065,2067-2075].codfw.wmnet
- 08:59 fceratto@cumin1003: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from eqiad to codfw for section test-s4
- 08:59 fceratto@cumin1003: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw for section test-s4
- 08:59 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2038-2039,2041-2042,2044,2046,2049-2051,2055-2059].codfw.wmnet
- 08:59 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2038-2039,2041-2042,2044,2046,2049-2051,2055-2059].codfw.wmnet
- 08:50 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2038-2039,2041-2042,2044,2046,2049-2051,2055-2059].codfw.wmnet
- 08:46 kharlan@deploy2002: Finished scap sync-world: Backport for hcaptcha: Use the global edit key for MobileFrontend edits if present (T420574) (duration: 14m 42s)
- 08:44 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 08:43 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 08:41 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2038-2039,2041-2042,2044,2046,2049-2051,2055-2059].codfw.wmnet
- 08:40 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2005-2006,2011-2018,2033-2037].codfw.wmnet
- 08:40 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2005-2006,2011-2018,2033-2037].codfw.wmnet
- 08:40 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 08:39 kharlan@deploy2002: kharlan: Continuing with sync
- 08:38 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 08:37 kharlan@deploy2002: kharlan: Backport for hcaptcha: Use the global edit key for MobileFrontend edits if present (T420574) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 08:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 08:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 08:31 kharlan@deploy2002: Started scap sync-world: Backport for hcaptcha: Use the global edit key for MobileFrontend edits if present (T420574)
- 08:29 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2005-2006,2011-2018,2033-2037].codfw.wmnet
- 08:19 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2005-2006,2011-2018,2033-2037].codfw.wmnet
- 08:18 jayme@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[2005-2006,2011-2018,2033-2039,2041-2042,2044,2046,2049-2051,2055-2062,2064-2065,2067-2078,2087-2095,2102-2115,2124-2179,2184-2199].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 07:45 kartik@deploy2002: Finished scap sync-world: Backport for Enable ULS rewrite beta feature (T418187 T253303) (duration: 41m 30s)
- 07:42 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 07:33 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 07:30 kartik@deploy2002: kartik, abi: Continuing with sync
- 07:30 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 07:29 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 07:22 kartik@deploy2002: kartik, abi: Backport for Enable ULS rewrite beta feature (T418187 T253303) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:17 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 07:16 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 07:03 kartik@deploy2002: Started scap sync-world: Backport for Enable ULS rewrite beta feature (T418187 T253303)
- 02:08 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 07m 55s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-03-22
- 02:50 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doh7004.wikimedia.org with reason: depooled host
- 02:50 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doh7003.wikimedia.org with reason: depooled host
- 02:09 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 21s)
- 02:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-03-20
- 23:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet
- 23:30 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet
- 22:34 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host lvs2013.codfw.wmnet
- 22:34 brett: Started pybal on lvs2013
- 22:27 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet
- 21:57 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet [reason: trixie reimaging]
- 21:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5023.eqsin.wmnet with OS trixie
- 21:55 hashar: Upgrading CI Jenkins T420477
- 21:25 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
- 21:21 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
- 21:04 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2013.codfw.wmnet with reason: debugging ipip
- 20:46 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5023.eqsin.wmnet with OS trixie
- 20:45 mutante: contint1003/2003 apt remove --purge apache2* ; apt remove --purge php* | T418521
- 20:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet
- 20:40 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet
- 20:38 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5023.eqsin.wmnet with OS trixie
- 20:24 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doh3006.wikimedia.org with reason: depooled host
- 20:24 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on doh3005.wikimedia.org with reason: depooled host
- 20:23 sukhe@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on doh3005.wikimedia.org with reason: depooled host
- 19:50 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2013.codfw.wmnet with reason: debugging ipip
- 19:33 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet
- 19:30 cdobbins@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet
- 19:21 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-tcp-proxy (exit_code=0) rolling reboot on A:tcpproxy and A:tcpproxy
- 19:16 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5023.eqsin.wmnet with OS trixie
- 19:16 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp5023.eqsin.wmnet [reason: trixie reimaging]
- 19:16 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet [reason: trixie reimaging]
- 19:14 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS trixie
- 18:52 cdobbins@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2013.codfw.wmnet with reason: reboot
- 18:43 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
- 18:39 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
- 18:28 cdobbins@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs2013.codfw.wmnet with reason: reboot
- 18:16 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-tcp-proxy rolling reboot on A:tcpproxy and A:tcpproxy
- 18:14 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on db1253.eqiad.wmnet with reason: T420041
- 17:59 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS trixie
- 17:54 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5021.eqsin.wmnet with OS trixie
- 17:51 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host lvs2014.codfw.wmnet
- 17:40 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on contint1003.wikimedia.org with reason: jenkins on java21
- 17:39 cdobbins@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs2014.codfw.wmnet
- 16:54 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
- 16:54 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1172.eqiad.wmnet with OS bullseye
- 16:46 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
- 16:33 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS trixie
- 16:32 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp5021.eqsin.wmnet [reason: trixie reimaging]
- 16:09 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1172.eqiad.wmnet with OS bullseye
- 16:08 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2002.codfw.wmnet
- 16:02 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-wf2002.codfw.wmnet
- 15:51 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2001.codfw.wmnet
- 15:45 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 15:45 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-wf2001.codfw.wmnet
- 15:43 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2041.codfw.wmnet
- 15:37 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc2041.codfw.wmnet
- 15:32 cparle@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 15:32 cparle@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 15:29 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-eqiad
- 15:16 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2040.codfw.wmnet
- 15:10 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc2040.codfw.wmnet
- 15:02 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 15:01 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 15:00 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 14:59 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 14:58 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 14:58 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 14:57 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 14:56 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 14:55 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 14:50 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 14:45 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 14:45 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:45 blake@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{wikikube-worker[2001-2002].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 14:45 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2002.codfw.wmnet
- 14:45 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2002.codfw.wmnet
- 14:44 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 14:44 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:37 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2002.codfw.wmnet
- 14:37 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2002.codfw.wmnet
- 14:37 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2001.codfw.wmnet
- 14:37 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2001.codfw.wmnet
- 14:36 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 14:35 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:35 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:34 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 14:30 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2001.codfw.wmnet
- 14:29 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2001.codfw.wmnet
- 14:29 blake@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[2001-2002].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 14:27 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1335-1349].eqiad.wmnet
- 14:27 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1335-1349].eqiad.wmnet
- 14:26 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2039.codfw.wmnet
- 14:21 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc2039.codfw.wmnet
- 14:16 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/wdqs-queryhammer: apply
- 14:16 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/wdqs-queryhammer: apply
- 14:14 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc2038.codfw.wmnet
- 14:08 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc2038.codfw.wmnet
- 13:54 jgreen@dns1004: END - running authdns-update
- 13:52 jgreen@dns1004: START - running authdns-update
- 13:49 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:48 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:47 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:39 inflatador: bking@deploy2002 restarting opensearch-ipoid cluster to apply new certificates
- 13:33 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-eqiad
- 13:20 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:14 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-canary
- 13:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for doh[3005-3006].wikimedia.org
- 13:14 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for doh[3005-3006].wikimedia.org
- 13:08 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-canary
- 13:05 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 13:03 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:59 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:58 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2006.codfw.wmnet
- 12:56 cparle@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 12:55 cparle@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 12:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2006.codfw.wmnet
- 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet
- 12:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet
- 12:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1005.eqiad.wmnet
- 12:35 jiji@cumin1003: END (ERROR) - Cookbook sre.memcached.roll-reboot-restart (exit_code=97) rolling reboot on A:memcached-codfw
- 12:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1005.eqiad.wmnet
- 12:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
- 12:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
- 12:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
- 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
- 11:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
- 11:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1172.eqiad.wmnet with OS bullseye
- 11:27 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
- 11:24 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-codfw
- 10:26 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-codfw
- 10:13 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:12 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:10 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:05 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:05 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:04 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:02 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:58 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:57 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:57 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:56 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:56 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:55 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:53 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:50 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:47 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:46 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.print-network-topology (exit_code=0)
- 09:46 jayme@cumin1003: START - Cookbook sre.k8s.print-network-topology
- 09:45 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.print-network-topology (exit_code=0)
- 09:45 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 09:45 jayme@cumin1003: START - Cookbook sre.k8s.print-network-topology
- 09:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:37 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.print-network-topology (exit_code=0)
- 09:37 jayme@cumin1003: START - Cookbook sre.k8s.print-network-topology
- 09:37 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:37 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.print-network-topology (exit_code=0)
- 09:36 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:36 jayme@cumin1003: START - Cookbook sre.k8s.print-network-topology
- 09:35 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:35 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:34 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.print-network-topology (exit_code=0)
- 09:34 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:34 jayme@cumin1003: START - Cookbook sre.k8s.print-network-topology
- 09:33 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:26 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-worker1172.eqiad.wmnet with OS bullseye
- 09:25 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:24 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:23 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:20 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.print-network-topology (exit_code=0)
- 09:19 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:19 jayme@cumin1003: START - Cookbook sre.k8s.print-network-topology
- 09:18 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.print-network-topology (exit_code=0)
- 09:18 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:18 jayme@cumin1003: START - Cookbook sre.k8s.print-network-topology
- 09:17 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-worker1172.eqiad.wmnet with OS bullseye
- 09:15 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 08:57 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-codfw
- 05:30 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet [reason: trixie reimaging]
- 05:30 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet [reason: trixie reimaging]
- 02:43 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on doh3005.wikimedia.org with reason: alerting is flapping
- 02:42 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on doh3006.wikimedia.org with reason: alerting is flapping
- 01:21 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5019.eqsin.wmnet with OS trixie
- 01:17 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5024.eqsin.wmnet with OS trixie
- 00:48 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
- 00:44 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
- 00:38 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
- 00:37 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
- 00:05 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS trixie
- 00:01 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5024.eqsin.wmnet with OS trixie
- 00:01 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS trixie
2026-03-19
- 23:59 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS trixie
- 23:40 ladsgroup@deploy2002: Finished scap sync-world: Backport for Make the handler follow the thumb steps (T414805) (duration: 06m 14s)
- 23:36 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 23:35 ladsgroup@deploy2002: ladsgroup: Backport for Make the handler follow the thumb steps (T414805) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:33 ladsgroup@deploy2002: Started scap sync-world: Backport for Make the handler follow the thumb steps (T414805)
- 22:48 zabe@deploy2002: mwscript-k8s job started: foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https # T420643
- 22:19 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 22:18 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 22:17 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 22:16 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 22:08 jforrester@deploy2002: Finished scap sync-world: Backport for Set WikiLambdaAbstractNamespaces's merge_strategy to provide_default (T420649) (duration: 06m 46s)
- 22:04 jforrester@deploy2002: jforrester: Continuing with sync
- 22:03 jforrester@deploy2002: jforrester: Backport for Set WikiLambdaAbstractNamespaces's merge_strategy to provide_default (T420649) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:01 jforrester@deploy2002: Started scap sync-world: Backport for Set WikiLambdaAbstractNamespaces's merge_strategy to provide_default (T420649)
- 21:58 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS trixie
- 21:57 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-codfw
- 21:57 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet [reason: trixie reimaging]
- 21:56 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS trixie
- 21:56 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp5024.eqsin.wmnet [reason: trixie reimaging]
- 21:55 jdlrobson@deploy2002: Finished scap sync-world: Backport for Implement addListener fallback for older browsers in matchMedia (T419717) (duration: 07m 17s)
- 21:51 jdlrobson@deploy2002: jdlrobson: Continuing with sync
- 21:49 jdlrobson@deploy2002: jdlrobson: Backport for Implement addListener fallback for older browsers in matchMedia (T419717) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:48 jdlrobson@deploy2002: Started scap sync-world: Backport for Implement addListener fallback for older browsers in matchMedia (T419717)
- 21:29 jdlrobson@deploy2002: Finished scap sync-world: Backport for Skins: Address issue with blurry images for large thumbnails (T375981) (duration: 07m 03s)
- 21:25 jdlrobson@deploy2002: jdlrobson: Continuing with sync
- 21:24 jdlrobson@deploy2002: jdlrobson: Backport for Skins: Address issue with blurry images for large thumbnails (T375981) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:22 jdlrobson@deploy2002: Started scap sync-world: Backport for Skins: Address issue with blurry images for large thumbnails (T375981)
- 21:11 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backup2020.codfw.wmnet with reason: kernel module reload
- 21:10 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 11 hosts with reason: kernel module reload
- 20:36 kgraessle@deploy2002: Finished scap sync-world: Backport for Deploy Extension:PersonalDashboard to English Wikipedia (T418367), Deploy PRV to 13 wikis (T420273) (duration: 11m 00s)
- 20:32 kgraessle@deploy2002: kgraessle, arlolra: Continuing with sync
- 20:27 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1016.eqiad.wmnet
- 20:27 kgraessle@deploy2002: kgraessle, arlolra: Backport for Deploy Extension:PersonalDashboard to English Wikipedia (T418367), Deploy PRV to 13 wikis (T420273) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:25 kgraessle@deploy2002: Started scap sync-world: Backport for Deploy Extension:PersonalDashboard to English Wikipedia (T418367), Deploy PRV to 13 wikis (T420273)
- 20:25 cdobbins@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1016.eqiad.wmnet
- 20:11 cdobbins@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs1016.eqiad.wmnet with reason: reboot
- 20:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add analytic vlan hostnames - cmooney@cumin1003"
- 20:01 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add analytic vlan hostnames - cmooney@cumin1003"
- 19:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
- 19:56 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 19:56 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
- 19:55 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 19:53 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 4.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa. on all recursors
- 19:53 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache 4.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa. on all recursors
- 19:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 19:51 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 7 hosts with reason: kernel module reload
- 19:44 topranks: disable IPv6 VRRP for et-1/0/5.1023 sub-interfaces on eqiad core routers T405562
- 19:36 brett: stopping pybal/puppet on lvs1018 for reboots
- 19:35 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1018.eqiad.wmnet with reason: reboots
- 19:00 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: kernel module reload
- 19:00 cdobbins@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
- 19:00 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-codfw
- 19:00 topranks: add vlan sub-interface for analytics1-d-eqiad vlan to leaf switches in eqiad row d T405562
- 18:44 cdobbins@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs1019.eqiad.wmnet with reason: planned reboot
- 18:42 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:aqs-codfw
- 18:31 jforrester@deploy2002: Finished scap sync-world: Backport for RepoBooks::onMediaWikiServices: Skip all low NSes, not just NS0 (T420617), SpecialAbstractContent: Fix hard-coded policy list page namespace, [abstractwiki] Allow "Abstract:" as well as "Abstract Wikipedia:" as NS_PROJECT (duration: 06m 20s)
- 18:27 jforrester@deploy2002: jforrester: Continuing with sync
- 18:26 jforrester@deploy2002: jforrester: Backport for RepoBooks::onMediaWikiServices: Skip all low NSes, not just NS0 (T420617), SpecialAbstractContent: Fix hard-coded policy list page namespace, [abstractwiki] Allow "Abstract:" as well as "Abstract Wikipedia:" as NS_PROJECT synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now b
- 18:24 jforrester@deploy2002: Started scap sync-world: Backport for RepoBooks::onMediaWikiServices: Skip all low NSes, not just NS0 (T420617), SpecialAbstractContent: Fix hard-coded policy list page namespace, [abstractwiki] Allow "Abstract:" as well as "Abstract Wikipedia:" as NS_PROJECT
- 18:02 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
- 17:55 cdobbins@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
- 17:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 17:46 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 17:45 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs1020.eqiad.wmnet
- 17:44 cdobbins@cumin2002: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
- 17:30 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5026.*
- 17:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha-proxy4004.wikimedia.org
- 17:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha-proxy4004.wikimedia.org with OS bookworm
- 17:26 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on contint1003.wikimedia.org with reason: jenkins on java21
- 17:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5026.eqsin.wmnet
- 17:22 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:22 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:22 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:22 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:21 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:18 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:15 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5026.eqsin.wmnet
- 17:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha-proxy4004.wikimedia.org with reason: host reimage
- 17:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh4002.wikimedia.org
- 17:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh4002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 17:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha-proxy4004.wikimedia.org with reason: host reimage
- 17:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh4002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 17:07 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 17:07 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 17:05 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp5026.eqsin.wmnet with reason: firmware updates
- 17:04 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 17:03 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5025.*
- 17:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5025.eqsin.wmnet
- 16:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts doh4002.wikimedia.org
- 16:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh4001.wikimedia.org
- 16:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh4001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 16:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh4001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 16:58 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 16:57 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 16:57 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 16:56 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 16:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 16:52 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp5025.eqsin.wmnet
- 16:50 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts doh4001.wikimedia.org
- 16:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1004.eqiad.wmnet
- 16:46 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1151.eqiad.wmnet
- 16:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5026.eqsin.wmnet with OS trixie
- 16:44 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host hcaptcha-proxy4004.wikimedia.org with OS bookworm
- 16:44 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s3
- 16:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha-proxy4004.wikimedia.org - jmm@cumin2002"
- 16:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha-proxy4004.wikimedia.org - jmm@cumin2002"
- 16:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha-proxy4004.wikimedia.org on all recursors
- 16:43 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache hcaptcha-proxy4004.wikimedia.org on all recursors
- 16:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha-proxy4004.wikimedia.org - jmm@cumin2002"
- 16:42 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha-proxy4004.wikimedia.org - jmm@cumin2002"
- 16:42 jforrester@deploy2002: Finished scap sync-world: Backport for Activate Abstract Wikipedia (T411723) (duration: 06m 09s)
- 16:41 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp5025.eqsin.wmnet with reason: firmware updates
- 16:41 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-master1004.eqiad.wmnet
- 16:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5025.eqsin.wmnet with OS trixie
- 16:39 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1151.eqiad.wmnet
- 16:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 16:39 jmm@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 16:38 jforrester@deploy2002: jforrester: Continuing with sync
- 16:38 jforrester@deploy2002: jforrester: Backport for Activate Abstract Wikipedia (T411723) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:36 jforrester@deploy2002: Started scap sync-world: Backport for Activate Abstract Wikipedia (T411723)
- 16:35 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
- 16:33 jforrester@deploy2002: Finished scap sync-world: Backport for Revert "[abstractwiki] Temporarily disable wgWikiLambdaEnableAbstractMode to see if this means we can create the wiki" (duration: 07m 19s)
- 16:29 jforrester@deploy2002: jforrester: Continuing with sync
- 16:28 jforrester@deploy2002: jforrester: Backport for Revert "[abstractwiki] Temporarily disable wgWikiLambdaEnableAbstractMode to see if this means we can create the wiki" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:26 jforrester@deploy2002: Started scap sync-world: Backport for Revert "[abstractwiki] Temporarily disable wgWikiLambdaEnableAbstractMode to see if this means we can create the wiki"
- 16:25 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 16:24 jforrester@deploy2002: Finished scap sync-world: Backport for [abstractwiki] Temporarily disable wgWikiLambdaEnableAbstractMode to see if this means we can create the wiki (T420531) (duration: 06m 06s)
- 16:23 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:aqs-codfw
- 16:21 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 16:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
- 16:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
- 16:20 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 16:20 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host hcaptcha-proxy4004.wikimedia.org
- 16:20 fabfur@cumin1003: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on P{cp2041*} and A:cp - 3.2 test upgrade ()
- 16:20 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2041*} and A:cp - 3.2 test upgrade ()
- 16:20 jforrester@deploy2002: jforrester: Continuing with sync
- 16:19 jforrester@deploy2002: jforrester: Backport for [abstractwiki] Temporarily disable wgWikiLambdaEnableAbstractMode to see if this means we can create the wiki (T420531) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:18 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2003.codfw.wmnet
- 16:17 jforrester@deploy2002: Started scap sync-world: Backport for [abstractwiki] Temporarily disable wgWikiLambdaEnableAbstractMode to see if this means we can create the wiki (T420531)
- 16:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
- 16:17 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
- 16:17 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
- 16:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host hcaptcha-proxy4003.wikimedia.org
- 16:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host hcaptcha-proxy4003.wikimedia.org with OS bookworm
- 16:15 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1142.eqiad.wmnet
- 16:14 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk2003.codfw.wmnet
- 16:14 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2002.codfw.wmnet
- 16:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
- 16:10 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk2002.codfw.wmnet
- 16:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
- 16:08 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 16:07 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 16:06 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1142.eqiad.wmnet
- 16:06 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 16:06 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk2001.codfw.wmnet
- 16:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5026.eqsin.wmnet with reason: host reimage
- 16:05 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5025.eqsin.wmnet with reason: host reimage
- 16:05 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 16:02 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk2001.codfw.wmnet
- 15:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on hcaptcha-proxy4003.wikimedia.org with reason: host reimage
- 15:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on hcaptcha-proxy4003.wikimedia.org with reason: host reimage
- 15:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief2002.codfw.wmnet
- 15:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS trixie
- 15:34 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS trixie
- 15:34 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5025.eqsin.wmnet with OS trixie
- 15:34 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5026.eqsin.wmnet with OS trixie
- 15:32 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host acmechief2002.codfw.wmnet
- 15:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief1002.eqiad.wmnet
- 15:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host hcaptcha-proxy4003.wikimedia.org with OS bookworm
- 15:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha-proxy4003.wikimedia.org - jmm@cumin2002"
- 15:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM hcaptcha-proxy4003.wikimedia.org - jmm@cumin2002"
- 15:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) hcaptcha-proxy4003.wikimedia.org on all recursors
- 15:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache hcaptcha-proxy4003.wikimedia.org on all recursors
- 15:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha-proxy4003.wikimedia.org - jmm@cumin2002"
- 15:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test2001.codfw.wmnet
- 15:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host acmechief-test1001.eqiad.wmnet
- 15:28 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host acmechief1002.eqiad.wmnet
- 15:26 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host acmechief-test2001.codfw.wmnet
- 15:25 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host acmechief-test1001.eqiad.wmnet
- 15:22 phuedx@deploy2002: Finished scap sync-world: Backport for Hooks: Re-apply I52fc151ab88d79754baeff35d2c0f200ebe9fc9a (duration: 09m 55s)
- 15:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM hcaptcha-proxy4003.wikimedia.org - jmm@cumin2002"
- 15:22 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 15:22 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 15:21 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:21 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:21 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 15:18 phuedx@deploy2002: phuedx: Continuing with sync
- 15:18 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 15:18 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:17 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:17 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:16 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:16 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 15:15 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 15:15 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:15 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:15 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 15:14 phuedx@deploy2002: phuedx: Backport for Hooks: Re-apply I52fc151ab88d79754baeff35d2c0f200ebe9fc9a synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:14 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 15:14 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host hcaptcha-proxy4003.wikimedia.org
- 15:12 phuedx@deploy2002: Started scap sync-world: Backport for Hooks: Re-apply I52fc151ab88d79754baeff35d2c0f200ebe9fc9a
- 15:11 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 15:10 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 15:10 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 15:09 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 15:09 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 15:09 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 15:07 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5026.eqsin.wmnet with OS trixie
- 15:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5025.eqsin.wmnet with OS trixie
- 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4004.wikimedia.org
- 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh4004.wikimedia.org with OS bookworm
- 15:05 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1003.eqiad.wmnet
- 15:01 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk1003.eqiad.wmnet
- 14:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1002.eqiad.wmnet
- 14:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host matomo1003.eqiad.wmnet
- 14:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zookeeper-test1002.eqiad.wmnet
- 14:55 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk1002.eqiad.wmnet
- 14:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host flink-zk1001.eqiad.wmnet
- 14:54 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host matomo1003.eqiad.wmnet
- 14:52 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host zookeeper-test1002.eqiad.wmnet
- 14:51 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host flink-zk1001.eqiad.wmnet
- 14:51 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1006.eqiad.wmnet
- 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4004.wikimedia.org with reason: host reimage
- 14:46 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-conf1006.eqiad.wmnet
- 14:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4004.wikimedia.org with reason: host reimage
- 14:40 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 14:38 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 14:35 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1005.eqiad.wmnet
- 14:32 bking@cumin2002: conftool action : set/pooled=yes:weight=1; selector: name=dse-k8s-worker1010.eqiad.wmnet|dse-k8s-worker1011.eqiad.wmnet|dse-k8s-worker1012.eqiad.wmnet|dse-k8s-worker1013.eqiad.wmnet|dse-k8s-worker1015.eqiad.wmnet|dse-k8s-worker1016.eqiad.wmnet|dse-k8s-worker1017.eqiad.wmnet|dse-k8s-worker1018.eqiad.wmnet|dse-k8s-worker1019.eqiad.wmnet
- 14:29 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-conf1005.eqiad.wmnet
- 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1004.eqiad.wmnet
- 14:25 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=dse-k8s-worker1012.eqiad.wmnet|dse-k8s-worker1015.eqiad.wmnet|dse-k8s-worker1016.eqiad.wmnet|dse-k8s-worker1017.eqiad.wmnet
- 14:22 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-conf1004.eqiad.wmnet
- 14:21 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
- 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host doh4004.wikimedia.org with OS bookworm
- 14:20 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
- 14:19 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
- 14:18 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
- 14:18 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh4004.wikimedia.org - jmm@cumin2002"
- 14:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh4004.wikimedia.org - jmm@cumin2002"
- 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh4004.wikimedia.org on all recursors
- 14:17 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache doh4004.wikimedia.org on all recursors
- 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh4004.wikimedia.org - jmm@cumin2002"
- 14:13 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 14:12 jmm@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 14:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh4004.wikimedia.org - jmm@cumin2002"
- 14:04 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 14:04 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh4004.wikimedia.org
- 14:03 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh4003.wikimedia.org
- 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh4003.wikimedia.org with OS bookworm
- 13:53 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:52 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:52 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:52 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:52 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:49 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:48 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:46 jforrester@deploy2002: Finished scap sync-world: Backport for Expose new wikifunctions.v0 REST API module on Wikifunctions.org only (T419053) (duration: 06m 03s)
- 13:42 jforrester@deploy2002: jforrester: Continuing with sync
- 13:42 jforrester@deploy2002: jforrester: Backport for Expose new wikifunctions.v0 REST API module on Wikifunctions.org only (T419053) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:40 jforrester@deploy2002: Started scap sync-world: Backport for Expose new wikifunctions.v0 REST API module on Wikifunctions.org only (T419053)
- 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4003.wikimedia.org with reason: host reimage
- 13:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4003.wikimedia.org with reason: host reimage
- 13:22 urbanecm@deploy2002: Finished scap sync-world: Backport for CreateAccount: Add class to aide in instrumentation, createAccount: Log exposure and CTRs for account creation experiment (T419916) (duration: 12m 58s)
- 13:22 moritzm: upgrade rpki1001 to Routinator 0.15.1 T420572
- 13:15 urbanecm@deploy2002: migr, urbanecm: Continuing with sync
- 13:13 urbanecm@deploy2002: migr, urbanecm: Backport for CreateAccount: Add class to aide in instrumentation, createAccount: Log exposure and CTRs for account creation experiment (T419916) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host doh4003.wikimedia.org with OS bookworm
- 13:09 urbanecm@deploy2002: Started scap sync-world: Backport for CreateAccount: Add class to aide in instrumentation, createAccount: Log exposure and CTRs for account creation experiment (T419916)
- 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh4003.wikimedia.org - jmm@cumin2002"
- 13:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh4003.wikimedia.org - jmm@cumin2002"
- 13:07 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 17 hosts with reason: upgrade
- 13:01 moritzm: installing rsync security updates
- 12:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1017.eqiad.wmnet
- 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts testvm7001.magru.wmnet
- 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: testvm7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh4003.wikimedia.org on all recursors
- 12:57 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache doh4003.wikimedia.org on all recursors
- 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh4003.wikimedia.org - jmm@cumin2002"
- 12:54 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 12:53 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1017.eqiad.wmnet
- 12:53 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1016.eqiad.wmnet
- 12:52 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 12:52 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 12:51 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 12:51 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
- 12:50 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
- 12:50 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 12:50 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 12:50 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 12:49 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 12:48 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 12:48 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 12:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1016.eqiad.wmnet
- 12:47 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 12:46 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 12:46 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-codfw
- 12:46 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:46 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 12:45 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 12:45 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 12:45 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 12:45 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 12:44 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 12:44 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 12:44 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 12:44 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 12:44 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 12:43 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 12:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh4003.wikimedia.org - jmm@cumin2002"
- 12:43 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 12:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
- 12:41 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 12:41 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts testvm7001.magru.wmnet
- 12:41 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 12:41 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 12:40 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 12:40 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 12:39 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 12:39 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 12:38 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 12:37 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 12:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:37 kamila@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
- 12:37 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh4003.wikimedia.org
- 12:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
- 12:29 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:27 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:25 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:25 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:24 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:23 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:22 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:22 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:10 urbanecm@deploy2002: mwscript-k8s job started: GrowthExperiments:reassignMentees --wiki=enwiki --mentor=Bilorv --performer=Bilorv --as-job # T418194
- 11:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
- 11:59 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 11:58 btullis@cumin1003: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd-codfw
- 11:57 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-codfw
- 11:53 moritzm: upgrade rpki2003 to Routinator 0.15.1 T420572
- 11:46 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 11:40 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:40 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 11:30 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1017.eqiad.wmnet with reason: host reimage
- 11:26 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1017.eqiad.wmnet with reason: host reimage
- 11:18 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-codfw
- 11:18 btullis@cumin1003: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd-codfw
- 11:11 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
- 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
- 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
- 11:05 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
- 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2003.codfw.wmnet
- 10:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki2003.codfw.wmnet
- 10:55 btullis@cumin1003: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P{cephosd100[4-5]*} and (A:cephosd-codfw or A:cephosd-eqiad)
- 10:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh4003.wikimedia.org
- 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh4003.wikimedia.org on all recursors
- 10:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache doh4003.wikimedia.org on all recursors
- 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM doh4003.wikimedia.org - jmm@cumin2002"
- 10:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM doh4003.wikimedia.org - jmm@cumin2002"
- 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
- 10:51 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-codfw
- 10:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh4003.wikimedia.org on all recursors
- 10:50 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache doh4003.wikimedia.org on all recursors
- 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh4003.wikimedia.org - jmm@cumin2002"
- 10:50 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh4003.wikimedia.org - jmm@cumin2002"
- 10:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
- 10:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
- 10:45 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2004.codfw.wmnet
- 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
- 10:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh4003.wikimedia.org
- 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4007.ulsfo.wmnet to cluster ulsfo02 and group 01
- 10:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4007.ulsfo.wmnet to cluster ulsfo02 and group 01
- 10:41 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host schema2004.codfw.wmnet
- 10:39 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema2003.codfw.wmnet
- 10:37 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host schema2003.codfw.wmnet
- 10:37 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddumps1002.wikimedia.org
- 10:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1004.eqiad.wmnet
- 10:36 Raine: created temporary categorylinks_icu72 tables -- T419980, T419049
- 10:36 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
- 10:34 aokoth@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/services/miscweb: apply
- 10:33 aokoth@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/services/miscweb: apply
- 10:32 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host schema1004.eqiad.wmnet
- 10:32 aokoth@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 10:31 aokoth@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
- 10:29 btullis@cumin1003: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P{cephosd100[4-5]*} and (A:cephosd-codfw or A:cephosd-eqiad)
- 10:28 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 10:28 fnegri@cumin1003: START - Cookbook sre.hosts.reboot-single for host clouddumps1002.wikimedia.org
- 10:26 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2006.codfw.wmnet
- 10:25 btullis@cumin1003: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
- 10:24 btullis@cumin1003: END (FAIL) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=99) rolling reboot on A:cephosd-eqiad
- 10:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
- 10:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2006.codfw.wmnet
- 10:21 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddumps1001.wikimedia.org
- 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2008.wikimedia.org
- 10:19 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 10:18 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2008.wikimedia.org
- 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2007.codfw.wmnet
- 10:13 fnegri@cumin1003: START - Cookbook sre.hosts.reboot-single for host clouddumps1001.wikimedia.org
- 10:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2007.codfw.wmnet
- 10:09 btullis@cumin1003: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
- 10:04 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:03 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4007.ulsfo.wmnet with OS bookworm
- 09:58 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 17 hosts with reason: upgrade
- 09:57 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host schema1003.eqiad.wmnet
- 09:53 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host schema1003.eqiad.wmnet
- 09:46 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@863e5c2] (releasing): T420477 (duration: 01m 07s)
- 09:45 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@863e5c2] (releasing): T420477
- 09:43 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
- 09:43 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@863e5c2] (releasing): T420477 (duration: 00m 59s)
- 09:42 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@863e5c2] (releasing): T420477
- 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
- 09:35 moritzm: installing libnginx-mod-http-lua security updates
- 09:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4007.ulsfo.wmnet with reason: host reimage
- 09:29 btullis@cumin1003: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd-eqiad
- 09:26 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
- 09:26 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
- 09:24 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-codfw
- 09:21 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:21 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:19 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:19 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 09:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4007.ulsfo.wmnet with OS bookworm
- 09:11 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
- 09:01 moritzm: remove ganeti4007 from classic Ganeti cluster in ulsfo T418993
- 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh4001.wikimedia.org to plain
- 08:54 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh4001.wikimedia.org to plain
- 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh4002.wikimedia.org to plain
- 08:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh4002.wikimedia.org to plain
- 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4001.wikimedia.org to plain
- 08:45 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4001.wikimedia.org to plain
- 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of hcaptcha-proxy4002.wikimedia.org to plain
- 08:44 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of hcaptcha-proxy4002.wikimedia.org to plain
- 08:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install4003.wikimedia.org to plain
- 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install4003.wikimedia.org to plain
- 08:40 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti4007.ulsfo.wmnet
- 08:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
- 08:38 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host doh4003.wikimedia.org
- 08:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh4003.wikimedia.org on all recursors
- 08:38 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache doh4003.wikimedia.org on all recursors
- 08:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM doh4003.wikimedia.org - jmm@cumin2002"
- 08:38 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM doh4003.wikimedia.org - jmm@cumin2002"
- 08:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh4003.wikimedia.org on all recursors
- 08:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache doh4003.wikimedia.org on all recursors
- 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh4003.wikimedia.org - jmm@cumin2002"
- 08:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh4003.wikimedia.org - jmm@cumin2002"
- 08:31 moritzm: installing python-apt security updates
- 08:29 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:29 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host doh4003.wikimedia.org
- 08:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply
- 08:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply
- 08:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 08:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 08:14 moritzm: installing imagemagick security updates on Bullseye
- 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1048.eqiad.wmnet
- 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet
- 08:12 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.20 refs T413811
- 08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet
- 07:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
- 07:17 aokoth@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 07:17 aokoth@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 07:16 aokoth@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 07:16 aokoth@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 07:14 aokoth@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 07:14 aokoth@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 04:53 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 00:06 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2003.codfw.wmnet
- 00:02 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host kafkamon2003.codfw.wmnet
- 00:01 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon1003.eqiad.wmnet
2026-03-18
- 23:58 mutante: releases2003 - kill 782 (stunnel4) - systemctl start stunnel4 - fix T420246 T420388 T420411
- 23:57 herron@cumin1003: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
- 23:49 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
- 23:23 eevans@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
- 23:08 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5017.*
- 23:02 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5020.*
- 23:01 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5028.*
- 22:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5028.eqsin.wmnet with OS trixie
- 22:16 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS trixie
- 22:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
- 22:04 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5028.eqsin.wmnet with reason: host reimage
- 21:51 herron@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-eqiad
- 21:49 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox
- 21:49 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns7002.wikimedia.org
- 21:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
- 21:41 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5027.*
- 21:40 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
- 21:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS trixie
- 21:30 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5028.eqsin.wmnet with OS trixie
- 21:30 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns7002.wikimedia.org
- 21:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5027.eqsin.wmnet with OS trixie
- 21:27 jforrester@deploy2002: mwscript-k8s job started: extensions/WikimediaMaintenance/maintenance/addWiki.php --wiki=abstractwiki # T411723 addWiki.php run
- 21:26 jforrester@deploy2002: mwscript-k8s job started: extensions/WikimediaMaintenance/maintenance/addWiki.php --wiki=abstractwiki # T411723 addWiki.php run
- 21:24 jforrester@deploy2002: Finished scap sync-world: Backport for Revert "OrchestratorRequest: Switch evaluations to v2 endpoint" (T418887), Create Abstract Wikipedia (T411725 T411726) (duration: 06m 44s)
- 21:20 jforrester@deploy2002: jforrester: Continuing with sync
- 21:20 jforrester@deploy2002: jforrester: Backport for Revert "OrchestratorRequest: Switch evaluations to v2 endpoint" (T418887), Create Abstract Wikipedia (T411725 T411726) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:17 jforrester@deploy2002: Started scap sync-world: Backport for Revert "OrchestratorRequest: Switch evaluations to v2 endpoint" (T418887), Create Abstract Wikipedia (T411725 T411726)
- 21:16 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS trixie
- 21:15 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns7001.wikimedia.org
- 21:12 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS trixie
- 21:08 jdlrobson@deploy2002: Finished scap sync-world: Backport for Guard for JS null deref on empty Parsoid sections (T419721), Reapply "hcaptcha: Enforce hCaptcha on API edits coming from the MobileFrontend" (T419125) (duration: 11m 20s)
- 21:07 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS trixie
- 21:07 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5028.eqsin.wmnet with OS trixie
- 21:04 jdlrobson@deploy2002: jdlrobson, harroyo-wmf: Continuing with sync
- 20:59 jdlrobson@deploy2002: jdlrobson, harroyo-wmf: Backport for Guard for JS null deref on empty Parsoid sections (T419721), Reapply "hcaptcha: Enforce hCaptcha on API edits coming from the MobileFrontend" (T419125) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:59 herron@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-eqiad
- 20:58 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns7001.wikimedia.org
- 20:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
- 20:57 jdlrobson@deploy2002: Started scap sync-world: Backport for Guard for JS null deref on empty Parsoid sections (T419721), Reapply "hcaptcha: Enforce hCaptcha on API edits coming from the MobileFrontend" (T419125)
- 20:52 herron@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-codfw
- 20:51 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5027.eqsin.wmnet with reason: host reimage
- 20:51 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-in1001.wikimedia.org with reason: T419960
- 20:50 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5020.eqsin.wmnet with OS trixie
- 20:50 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-in2001.wikimedia.org with reason: T419960
- 20:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS trixie
- 20:48 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5028.eqsin.wmnet with OS trixie
- 20:44 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
- 20:43 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns6002.wikimedia.org
- 20:42 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-out1001.wikimedia.org with reason: T419960
- 20:42 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1033.eqiad.wmnet with OS trixie
- 20:42 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 20:42 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 20:38 cscott@deploy2002: Finished scap sync-world: Backport for Limit legacy postprocessing cache to pages where DT does apply (T376183) (duration: 13m 54s)
- 20:37 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
- 20:35 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-out2001.wikimedia.org with reason: T419960
- 20:34 cscott@deploy2002: cscott: Continuing with sync
- 20:33 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns6002.wikimedia.org
- 20:28 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_codfw and not P{cp2042.codfw.wmnet} and A:cp
- 20:28 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2058.codfw.wmnet
- 20:28 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_codfw and not P{cp2041.codfw.wmnet} and A:cp
- 20:28 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2057.codfw.wmnet
- 20:26 cscott@deploy2002: cscott: Backport for Limit legacy postprocessing cache to pages where DT does apply (T376183) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:25 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1033.eqiad.wmnet with reason: host reimage
- 20:24 cscott@deploy2002: Started scap sync-world: Backport for Limit legacy postprocessing cache to pages where DT does apply (T376183)
- 20:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5027.eqsin.wmnet with OS trixie
- 20:22 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5029.*
- 20:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5029.eqsin.wmnet with OS trixie
- 20:20 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1033.eqiad.wmnet with reason: host reimage
- 20:18 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns6001.wikimedia.org
- 20:14 kemayo@deploy2002: Finished scap sync-world: Backport for Editcheck: fix tagging not happening for non-default checks (duration: 06m 28s)
- 20:10 kemayo@deploy2002: kemayo: Continuing with sync
- 20:10 kemayo@deploy2002: kemayo: Backport for Editcheck: fix tagging not happening for non-default checks synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:09 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host wdqs1033.eqiad.wmnet with OS trixie
- 20:08 kemayo@deploy2002: Started scap sync-world: Backport for Editcheck: fix tagging not happening for non-default checks
- 20:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5028.eqsin.wmnet with OS trixie
- 20:05 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns6001.wikimedia.org
- 20:05 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS trixie
- 20:05 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5030.*
- 20:05 herron@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-codfw
- 19:51 Reedy: running `foreachwikiindblist fishbowl.dblist extensions/OATHAuth/maintenance/UpdateSecretsToEncryptedFormat.php` T404363
- 19:51 Reedy: running `foreachwikiindblist private.dblist extensions/OATHAuth/maintenance/UpdateSecretsToEncryptedFormat.php` T404363
- 19:50 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns5004.wikimedia.org
- 19:50 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2056.codfw.wmnet
- 19:50 Reedy: running `mwscript extensions/OATHAuth/maintenance/UpdateSecretsToEncryptedFormat.php --wiki=metawiki` T404363
- 19:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
- 19:49 reedy@deploy2002: Synchronized private/PrivateSettings.php: Set $wgOATHSecretKey T404363 (duration: 05m 51s)
- 19:49 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2055.codfw.wmnet
- 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5030.eqsin.wmnet with OS trixie
- 19:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5029.eqsin.wmnet with reason: host reimage
- 19:39 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp5017.eqsin.wmnet with OS trixie
- 19:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 19:35 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 19:33 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns5004.wikimedia.org
- 19:30 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install4004.wikimedia.org with OS bookworm
- 19:29 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS trixie
- 19:28 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp5020.eqsin.wmnet [reason: trixie reimaging]
- 19:28 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet [reason: trixie reimaging]
- 19:27 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 19:27 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 19:26 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS trixie
- 19:23 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 19:23 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 19:18 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns5003.wikimedia.org
- 19:17 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5030.eqsin.wmnet with reason: host reimage
- 19:13 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5030.eqsin.wmnet with reason: host reimage
- 19:13 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install4004.wikimedia.org with reason: host reimage
- 19:11 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2054.codfw.wmnet
- 19:08 brett@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp5029.eqsin.wmnet
- 19:08 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2053.codfw.wmnet
- 19:08 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5029.eqsin.wmnet with OS trixie
- 19:08 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on install4004.wikimedia.org with reason: host reimage
- 19:02 brett@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp5029.eqsin.wmnet
- 19:01 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns5003.wikimedia.org
- 18:56 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5031.*
- 18:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5031.eqsin.wmnet with OS trixie
- 18:54 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
- 18:47 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
- 18:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5030.eqsin.wmnet with OS trixie
- 18:46 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns4004.wikimedia.org
- 18:45 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host install4004.wikimedia.org with OS bookworm
- 18:45 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp5032.*
- 18:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5032.eqsin.wmnet with OS trixie
- 18:32 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2052.codfw.wmnet
- 18:29 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2051.codfw.wmnet
- 18:27 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns4004.wikimedia.org
- 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5031.eqsin.wmnet with reason: host reimage
- 18:18 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS trixie
- 18:17 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5031.eqsin.wmnet with reason: host reimage
- 18:17 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet [reason: trixie reimaging]
- 18:17 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS trixie
- 18:16 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp5018.eqsin.wmnet [reason: trixie reimaging]
- 18:16 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet [reason: trixie reimaging]
- 18:16 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet [reason: trixie reimaging]
- 18:13 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3077.esams.wmnet [reason: trixie reimaging]
- 18:13 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3078.esams.wmnet [reason: trixie reimaging]
- 18:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
- 18:12 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns4003.wikimedia.org
- 18:09 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1258: Ready
- 18:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5032.eqsin.wmnet with reason: host reimage
- 18:01 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
- 17:59 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3078.esams.wmnet with OS trixie
- 17:56 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3077.esams.wmnet with OS trixie
- 17:55 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns4003.wikimedia.org
- 17:54 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2050.codfw.wmnet
- 17:51 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2049.codfw.wmnet
- 17:43 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 17:42 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 17:40 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns3004.wikimedia.org
- 17:39 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 17:38 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 17:38 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on backupmon1001.eqiad.wmnet with reason: upgrade
- 17:35 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1347.eqiad.wmnet with OS trixie
- 17:33 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3078.esams.wmnet with reason: host reimage
- 17:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS trixie
- 17:32 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS trixie
- 17:32 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5032.eqsin.wmnet with OS trixie
- 17:32 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5031.eqsin.wmnet with OS trixie
- 17:32 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:30 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:30 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3077.esams.wmnet with reason: host reimage
- 17:29 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 17:29 claime: rearmed keyholder on deploy1003
- 17:28 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 17:27 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3078.esams.wmnet with reason: host reimage
- 17:26 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3077.esams.wmnet with reason: host reimage
- 17:25 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 17:25 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 17:23 fceratto@cumin1003: START - Cookbook sre.mysql.pool pool db1258: Ready
- 17:23 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns3004.wikimedia.org
- 17:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 17:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 17:20 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir-esams and A:ncredir
- 17:19 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1347.eqiad.wmnet with reason: host reimage
- 17:18 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir-drmrs and A:ncredir
- 17:16 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir-eqiad and A:ncredir
- 17:15 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir-ulsfo and A:ncredir
- 17:15 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2048.codfw.wmnet
- 17:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5032.eqsin.wmnet with OS trixie
- 17:14 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1347.eqiad.wmnet with reason: host reimage
- 17:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5031.eqsin.wmnet with OS trixie
- 17:12 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:12 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2047.codfw.wmnet
- 17:11 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:09 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
- 17:09 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp3078.*
- 17:08 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:08 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:08 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3079.*
- 17:08 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns3003.wikimedia.org
- 17:08 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3078.*
- 17:07 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir-eqiad and A:ncredir
- 17:07 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
- 17:07 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir-esams and A:ncredir
- 17:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
- 17:06 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir2002.*
- 17:05 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir-drmrs and A:ncredir
- 17:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir2002.codfw.wmnet
- 17:05 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir-eqsin and A:ncredir
- 17:05 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir-ulsfo and A:ncredir
- 17:04 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir-magru and A:ncredir
- 17:03 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3078.esams.wmnet with OS trixie
- 17:02 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-worker1347
- 17:02 jayme@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1347
- 17:02 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3078.esams.wmnet [reason: trixie reimaging]
- 17:01 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3076.esams.wmnet [reason: trixie reimaging]
- 17:01 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3077.esams.wmnet with OS trixie
- 17:01 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3077.esams.wmnet [reason: trixie reimaging]
- 16:59 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncredir2002.codfw.wmnet
- 16:58 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=ncredir2002.*
- 16:56 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: upgrade
- 16:55 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir2001.*
- 16:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ncredir2001.codfw.wmnet
- 16:55 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for ncredir2001.codfw.wmnet
- 16:55 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3076.esams.wmnet with OS trixie
- 16:53 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup2014.codfw.wmnet
- 16:52 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir-eqsin and A:ncredir
- 16:52 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2008.codfw.wmnet with reason: kernel update
- 16:51 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir-magru and A:ncredir
- 16:51 klausman@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve1013.eqiad.wmnet with reason: Reboot for security update
- 16:50 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup2013.codfw.wmnet
- 16:49 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=ncredir2001.*
- 16:49 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=97) rolling reboot on A:ncredir and A:ncredir
- 16:48 jayme@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1347
- 16:48 jayme@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1347.eqiad.wmnet 199.48.64.10.in-addr.arpa 9.9.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:48 jayme@cumin1003: START - Cookbook sre.dns.wipe-cache wikikube-worker1347.eqiad.wmnet 199.48.64.10.in-addr.arpa 9.9.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:48 jayme@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:47 jayme@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1347 - jayme@cumin1003"
- 16:47 jayme@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-worker1347 - jayme@cumin1003"
- 16:47 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns3003.wikimedia.org
- 16:47 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1012.eqiad.wmnet
- 16:47 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir and A:ncredir
- 16:47 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup2012.codfw.wmnet
- 16:47 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2014.codfw.wmnet
- 16:46 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3075.esams.wmnet [reason: trixie reimaging]
- 16:46 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup2003.codfw.wmnet
- 16:45 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet
- 16:44 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2013.codfw.wmnet
- 16:44 jayme@cumin1003: START - Cookbook sre.dns.netbox
- 16:43 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup2009.codfw.wmnet
- 16:43 jayme@cumin1003: START - Cookbook sre.hosts.move-vlan for host wikikube-worker1347
- 16:43 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
- 16:43 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1347.eqiad.wmnet with OS trixie
- 16:43 brett@cumin2002: START - Cookbook sre.hosts.reboot-cluster
- 16:41 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2007.codfw.wmnet with reason: kernel update
- 16:40 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2012.codfw.wmnet
- 16:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3079.esams.wmnet with OS trixie
- 16:39 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup2008.codfw.wmnet
- 16:38 moritzm: installing PHP 8.2 security updates
- 16:37 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2009.codfw.wmnet
- 16:36 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2046.codfw.wmnet
- 16:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3078.esams.wmnet with OS trixie
- 16:34 moritzm: installing alsa-lib security updates
- 16:33 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2045.codfw.wmnet
- 16:32 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2008.codfw.wmnet
- 16:32 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns2006.wikimedia.org
- 16:29 moritzm: failover Ganeti master in eqiad to ganeti1046
- 16:29 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3076.esams.wmnet with reason: host reimage
- 16:29 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2003.codfw.wmnet
- 16:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1054.eqiad.wmnet
- 16:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1054.eqiad.wmnet
- 16:24 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy2005.codfw.wmnet with reason: kernel update
- 16:24 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3076.esams.wmnet with reason: host reimage
- 16:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1054.eqiad.wmnet
- 16:22 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-serve1012.eqiad.wmnet
- 16:20 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup1013.eqiad.wmnet
- 16:19 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1011.eqiad.wmnet
- 16:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1054.eqiad.wmnet
- 16:18 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
- 16:16 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host install4004.wikimedia.org with OS bookworm
- 16:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1053.eqiad.wmnet
- 16:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1053.eqiad.wmnet
- 16:14 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns2006.wikimedia.org
- 16:14 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1013.eqiad.wmnet
- 16:14 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup1009.eqiad.wmnet
- 16:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
- 16:13 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-serve1011.eqiad.wmnet
- 16:12 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy1029.eqiad.wmnet with reason: kernel update
- 16:12 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1016.eqiad.wmnet with OS bookworm
- 16:12 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 16:11 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
- 16:11 moritzm: powercycling ganeti1053 (stuck on reboot)
- 16:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3078.esams.wmnet with reason: host reimage
- 16:09 klausman@cumin1003: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 16:09 klausman@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 16:08 klausman@cumin1003: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 16:07 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1009.eqiad.wmnet
- 16:07 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup1003.eqiad.wmnet
- 16:06 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3079.esams.wmnet with reason: host reimage
- 16:06 klausman@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 16:06 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3078.esams.wmnet with reason: host reimage
- 16:04 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 16:04 klausman@cumin1003: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 16:04 klausman@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 16:02 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
- 16:01 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy1028.eqiad.wmnet with reason: kernel update
- 16:00 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1003.eqiad.wmnet
- 16:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1015.eqiad.wmnet
- 16:00 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3075.esams.wmnet with OS trixie
- 16:00 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3076.esams.wmnet with OS trixie
- 15:59 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns2005.wikimedia.org
- 15:58 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3076.esams.wmnet [reason: trixie reimaging]
- 15:58 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup1012.eqiad.wmnet
- 15:58 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2044.codfw.wmnet
- 15:58 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3078.esams.wmnet [reason: trixie reimaging]
- 15:57 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3078.esams.wmnet [reason: trixie reimaging]
- 15:57 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1010.eqiad.wmnet
- 15:57 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup1008.eqiad.wmnet
- 15:57 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3074.esams.wmnet [reason: trixie reimaging]
- 15:56 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
- 15:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1053.eqiad.wmnet
- 15:55 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp2043.codfw.wmnet
- 15:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:54 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbproxy1023.eqiad.wmnet with reason: kernel update
- 15:54 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1015.eqiad.wmnet
- 15:54 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dbproxy1022.eqiad.wmnet
- 15:54 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1008.eqiad.wmnet
- 15:54 fceratto@cumin1003: START - Cookbook sre.hosts.remove-downtime for dbproxy1022.eqiad.wmnet
- 15:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1053.eqiad.wmnet
- 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1052.eqiad.wmnet
- 15:52 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-serve1010.eqiad.wmnet
- 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1052.eqiad.wmnet
- 15:51 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1012.eqiad.wmnet
- 15:51 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3074.esams.wmnet with OS trixie
- 15:49 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_codfw and not P{cp2042.codfw.wmnet} and A:cp
- 15:48 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup1014.eqiad.wmnet
- 15:48 klausman@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:ml-serve-worker-eqiad
- 15:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy1003.eqiad.wmnet
- 15:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1052.eqiad.wmnet
- 15:46 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_codfw and not P{cp2041.codfw.wmnet} and A:cp
- 15:45 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:45 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns2005.wikimedia.org
- 15:42 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1014.eqiad.wmnet
- 15:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1052.eqiad.wmnet
- 15:41 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3079.esams.wmnet with OS trixie
- 15:41 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3078.esams.wmnet with OS trixie
- 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1051.eqiad.wmnet
- 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1051.eqiad.wmnet
- 15:39 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbproxy1022.eqiad.wmnet with reason: kernel update
- 15:38 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update for dse-k8s-worker1016 - btullis@cumin1003"
- 15:37 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update for dse-k8s-worker1016 - btullis@cumin1003"
- 15:37 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host deploy1003.eqiad.wmnet
- 15:37 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1027.eqiad.wmnet
- 15:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1016.eqiad.wmnet with reason: host reimage
- 15:35 cgoubert@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-eqiad
- 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1051.eqiad.wmnet
- 15:34 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3075.esams.wmnet with reason: host reimage
- 15:33 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1372.eqiad.wmnet
- 15:33 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1371.eqiad.wmnet
- 15:33 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1370.eqiad.wmnet
- 15:31 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1027.eqiad.wmnet
- 15:30 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns2004.wikimedia.org
- 15:29 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1016.eqiad.wmnet with reason: host reimage
- 15:28 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1369.eqiad.wmnet
- 15:28 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1368.eqiad.wmnet
- 15:28 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1372.eqiad.wmnet
- 15:28 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1367.eqiad.wmnet
- 15:28 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1366.eqiad.wmnet
- 15:28 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1371.eqiad.wmnet
- 15:28 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1370.eqiad.wmnet
- 15:28 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1365.eqiad.wmnet
- 15:28 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1364.eqiad.wmnet
- 15:28 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1363.eqiad.wmnet
- 15:27 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1362.eqiad.wmnet
- 15:27 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1361.eqiad.wmnet
- 15:27 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1017
- 15:27 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1360.eqiad.wmnet
- 15:26 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1017
- 15:25 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3074.esams.wmnet with reason: host reimage
- 15:25 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install4004.wikimedia.org with OS bookworm
- 15:25 sukhe@dns1004: END - running authdns-update
- 15:24 sukhe@dns1004: START - running authdns-update
- 15:24 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host install4004.wikimedia.org
- 15:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host install4004.wikimedia.org with OS bookworm
- 15:23 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1369.eqiad.wmnet
- 15:23 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1368.eqiad.wmnet
- 15:23 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1367.eqiad.wmnet
- 15:23 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1366.eqiad.wmnet
- 15:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1051.eqiad.wmnet
- 15:23 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1365.eqiad.wmnet
- 15:23 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1364.eqiad.wmnet
- 15:22 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1363.eqiad.wmnet
- 15:22 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1362.eqiad.wmnet
- 15:22 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1361.eqiad.wmnet
- 15:22 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1360.eqiad.wmnet
- 15:20 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1349.eqiad.wmnet
- 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1050.eqiad.wmnet
- 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1050.eqiad.wmnet
- 15:18 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3075.esams.wmnet with reason: host reimage
- 15:18 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3074.esams.wmnet with reason: host reimage
- 15:17 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1348.eqiad.wmnet
- 15:17 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1346.eqiad.wmnet
- 15:17 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1344.eqiad.wmnet
- 15:17 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1345.eqiad.wmnet
- 15:17 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1343.eqiad.wmnet
- 15:16 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1342.eqiad.wmnet
- 15:16 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns2004.wikimedia.org
- 15:15 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1349.eqiad.wmnet
- 15:15 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1341.eqiad.wmnet
- 15:15 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1340.eqiad.wmnet
- 15:15 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1339.eqiad.wmnet
- 15:15 moritzm: imported jenkins 2.541.3 for bullseye/bookworm/trixie
- 15:14 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1338.eqiad.wmnet
- 15:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1016.eqiad.wmnet with OS bookworm
- 15:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1050.eqiad.wmnet
- 15:12 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1348.eqiad.wmnet
- 15:12 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1346.eqiad.wmnet
- 15:12 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1336.eqiad.wmnet
- 15:12 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1337.eqiad.wmnet
- 15:12 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1345.eqiad.wmnet
- 15:12 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1344.eqiad.wmnet
- 15:12 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1334.eqiad.wmnet
- 15:12 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1335.eqiad.wmnet
- 15:11 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1343.eqiad.wmnet
- 15:11 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1342.eqiad.wmnet
- 15:11 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1332.eqiad.wmnet
- 15:11 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1333.eqiad.wmnet
- 15:11 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Post reimage - btullis@cumin1003"
- 15:11 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Post reimage - btullis@cumin1003"
- 15:10 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1341.eqiad.wmnet
- 15:10 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1340.eqiad.wmnet
- 15:10 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1331.eqiad.wmnet
- 15:10 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1330.eqiad.wmnet
- 15:10 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1339.eqiad.wmnet
- 15:09 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1329.eqiad.wmnet
- 15:09 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1338.eqiad.wmnet
- 15:09 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1328.eqiad.wmnet
- 15:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1050.eqiad.wmnet
- 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1049.eqiad.wmnet
- 15:07 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1337.eqiad.wmnet
- 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1049.eqiad.wmnet
- 15:07 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1336.eqiad.wmnet
- 15:07 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1335.eqiad.wmnet
- 15:06 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1334.eqiad.wmnet
- 15:06 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1333.eqiad.wmnet
- 15:06 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1332.eqiad.wmnet
- 15:05 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1331.eqiad.wmnet
- 15:05 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1330.eqiad.wmnet
- 15:04 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1329.eqiad.wmnet
- 15:04 blake@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1328.eqiad.wmnet
- 15:03 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-eqiad
- 15:02 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1033.eqiad.wmnet with OS trixie
- 15:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1049.eqiad.wmnet
- 15:01 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns1006.wikimedia.org
- 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum4002.ulsfo.wmnet
- 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum4002.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 14:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum4002.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 14:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1049.eqiad.wmnet
- 14:54 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3075.esams.wmnet with OS trixie
- 14:54 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3075.esams.wmnet [reason: trixie reimaging]
- 14:54 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3074.esams.wmnet with OS trixie
- 14:53 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3074.esams.wmnet [reason: trixie reimaging]
- 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1047.eqiad.wmnet
- 14:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
- 14:50 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 14:49 slyngshede@dns1004: END - running authdns-update
- 14:48 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 14:48 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 14:48 slyngshede@dns1004: START - running authdns-update
- 14:47 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns1006.wikimedia.org
- 14:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
- 14:46 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum4002.ulsfo.wmnet
- 14:45 cgoubert@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-eqiad
- 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum4001.ulsfo.wmnet
- 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum4001.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 14:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum4001.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
- 14:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1046.eqiad.wmnet
- 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
- 14:40 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 14:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
- 14:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts durum4001.ulsfo.wmnet
- 14:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install4004.wikimedia.org with OS bookworm
- 14:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1046.eqiad.wmnet
- 14:32 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns1005.wikimedia.org
- 14:32 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "inline pattern and pattern equivalence - oblivian@cumin1003"
- 14:32 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: inline pattern and pattern equivalence - oblivian@cumin1003
- 14:31 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: inline pattern and pattern equivalence - oblivian@cumin1003
- 14:31 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "inline pattern and pattern equivalence - oblivian@cumin1003"
- 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1045.eqiad.wmnet
- 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
- 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install4004.wikimedia.org - jmm@cumin2002"
- 14:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install4004.wikimedia.org - jmm@cumin2002"
- 14:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
- 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install4004.wikimedia.org on all recursors
- 14:24 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install4004.wikimedia.org on all recursors
- 14:21 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) install4004.wikimedia.org on all recursors
- 14:21 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install4004.wikimedia.org on all recursors
- 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4004.wikimedia.org - jmm@cumin2002"
- 14:20 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4004.wikimedia.org - jmm@cumin2002"
- 14:19 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns1005.wikimedia.org
- 14:17 jforrester@deploy2002: Finished scap sync-world: Backport for Restore quotation-marks in ext.wikilambda.app messages (T420456) (duration: 06m 32s)
- 14:17 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 14:17 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1373.eqiad.wmnet with OS bookworm
- 14:17 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:16 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:16 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 14:16 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4004.wikimedia.org
- 14:15 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:15 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:14 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:14 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:14 jforrester@deploy2002: jforrester: Continuing with sync
- 14:13 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:13 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 14:13 jforrester@deploy2002: jforrester: Backport for Restore quotation-marks in ext.wikilambda.app messages (T420456) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:13 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 14:11 jforrester@deploy2002: Started scap sync-world: Backport for Restore quotation-marks in ext.wikilambda.app messages (T420456)
- 14:08 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:08 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:07 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:06 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:05 XioNoX: set graceful-shutdown on EdgeUno transit sessions
- 14:05 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:04 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:04 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot finished rebooting dns1004.wikimedia.org
- 14:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1045.eqiad.wmnet
- 14:01 klausman@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 14:01 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:59 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage
- 13:57 Msz2001: UTC afternoon backport+config window done
- 13:56 mszwarc@deploy2002: Finished scap sync-world: Backport for Tweak configuration of external link aggregate usage analysis (T419837) (duration: 06m 41s)
- 13:55 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1033.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 13:55 jclark@cumin1003: START - Cookbook sre.hosts.provision for host wdqs1033.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 13:53 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1373.eqiad.wmnet with reason: host reimage
- 13:52 mszwarc@deploy2002: mszwarc: Continuing with sync
- 13:51 mszwarc@deploy2002: mszwarc: Backport for Tweak configuration of external link aggregate usage analysis (T419837) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:51 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:50 sukhe@cumin1003: cookbooks.sre.dns.roll-reboot begin reboot of dns1004.wikimedia.org
- 13:50 sukhe@cumin1003: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox
- 13:49 mszwarc@deploy2002: Started scap sync-world: Backport for Tweak configuration of external link aggregate usage analysis (T419837)
- 13:49 mszwarc@deploy2002: Finished scap sync-world: Backport for Normalize external domain names in click analysis (T419837), Normalize external domain names in click analysis (T419837) (duration: 07m 23s)
- 13:46 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1097.eqiad.wmnet
- 13:45 mszwarc@deploy2002: mszwarc: Continuing with sync
- 13:43 mszwarc@deploy2002: mszwarc: Backport for Normalize external domain names in click analysis (T419837), Normalize external domain names in click analysis (T419837) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2096.codfw.wmnet
- 13:41 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host wdqs1033.eqiad.wmnet with OS trixie
- 13:41 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS bookworm
- 13:41 mszwarc@deploy2002: Started scap sync-world: Backport for Normalize external domain names in click analysis (T419837), Normalize external domain names in click analysis (T419837)
- 13:40 sgimeno@deploy2002: Finished scap sync-world: Backport for filebackend: Remove outdated comment, GrowthExperiments: increase edit and thanks query limit II (T341599) (duration: 08m 47s)
- 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1044.eqiad.wmnet
- 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
- 13:39 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1097.eqiad.wmnet
- 13:39 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1096.eqiad.wmnet
- 13:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2096.codfw.wmnet
- 13:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2095.codfw.wmnet
- 13:36 sgimeno@deploy2002: matmarex, sgimeno: Continuing with sync
- 13:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
- 13:33 sgimeno@deploy2002: matmarex, sgimeno: Backport for filebackend: Remove outdated comment, GrowthExperiments: increase edit and thanks query limit II (T341599) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:31 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1096.eqiad.wmnet
- 13:31 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1095.eqiad.wmnet
- 13:31 sgimeno@deploy2002: Started scap sync-world: Backport for filebackend: Remove outdated comment, GrowthExperiments: increase edit and thanks query limit II (T341599)
- 13:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2095.codfw.wmnet
- 13:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2094.codfw.wmnet
- {{safesubst:SAL entry|1=13:28 sgimeno@deploy2002: Finished scap sync-world: Backport for loggedOutWarning: dont set the schema for experiment events (T420451), loggedOutWarning: dont set the schema for experiment events (T420451), Revert "SpecialPreferences: Use Language Select Widget in language field" (T419895), [[gerrit:1254890|Revert "SpecialPreferences: Use Language Select Widget in lan}}
- 13:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 13:28 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 13:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1026.eqiad.wmnet
- 13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1044.eqiad.wmnet
- 13:26 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 13:25 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1095.eqiad.wmnet
- 13:25 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1094.eqiad.wmnet
- 13:24 sgimeno@deploy2002: somerandomdeveloper, sgimeno: Continuing with sync
- {{safesubst:SAL entry|1=13:24 sgimeno@deploy2002: somerandomdeveloper, sgimeno: Backport for loggedOutWarning: dont set the schema for experiment events (T420451), loggedOutWarning: dont set the schema for experiment events (T420451), Revert "SpecialPreferences: Use Language Select Widget in language field" (T419895), [[gerrit:1254890|Revert "SpecialPreferences: Use Language Select Widget in}}
- 13:23 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2094.codfw.wmnet
- 13:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2093.codfw.wmnet
- {{safesubst:SAL entry|1=13:22 sgimeno@deploy2002: Started scap sync-world: Backport for loggedOutWarning: dont set the schema for experiment events (T420451), loggedOutWarning: dont set the schema for experiment events (T420451), Revert "SpecialPreferences: Use Language Select Widget in language field" (T419895), [[gerrit:1254890|Revert "SpecialPreferences: Use Language Select Widget in lang}}
- 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
- 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
- 13:21 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1026.eqiad.wmnet
- 13:20 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:16 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2093.codfw.wmnet
- 13:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2092.codfw.wmnet
- 13:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
- 13:16 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1094.eqiad.wmnet
- 13:16 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1093.eqiad.wmnet
- 13:15 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:15 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 13:15 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 13:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
- 13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
- 13:10 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:10 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 13:09 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1093.eqiad.wmnet
- 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
- 13:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2092.codfw.wmnet
- 13:08 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1027.eqiad.wmnet with reason: host reimage
- 13:08 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1016
- 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 13:07 jclark@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 13:06 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1016
- 13:06 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 13:04 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1027.eqiad.wmnet with reason: host reimage
- 13:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
- 13:02 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 13:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1012.eqiad.wmnet
- 12:58 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1020.eqiad.wmnet with OS bookworm
- 12:58 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 12:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
- 12:56 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1015.eqiad.wmnet
- 12:55 ayounsi@dns1004: END - running authdns-update
- 12:55 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 12:55 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 12:54 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1012.eqiad.wmnet
- 12:54 ayounsi@dns1004: START - running authdns-update
- 12:53 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1012.eqiad.wmnet
- 12:53 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 12:51 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1015.eqiad.wmnet
- 12:50 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 12:50 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
- 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
- 12:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-codfw
- 12:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
- 12:42 btullis@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-jumbo-eqiad
- 12:38 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:37 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 12:37 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:36 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 12:35 blake@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{wikikube-worker[1328-1372].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 12:33 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 12:32 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:31 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:30 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1020.eqiad.wmnet with reason: host reimage
- 12:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
- 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
- 12:25 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1020.eqiad.wmnet with reason: host reimage
- 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
- 12:25 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 12:25 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 12:25 btullis@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 12:24 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update for dse-k8s-worker1015 - btullis@cumin1003"
- 12:24 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update for dse-k8s-worker1015 - btullis@cumin1003"
- 12:22 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 12:22 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 12:21 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
- 12:19 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:19 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
- 12:13 mszwarc@deploy2002: Finished scap sync-world: Backport for Enable autodemotion for 2FA-less CN admins and WMF T&S (T418580) (duration: 06m 21s)
- 12:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
- 12:10 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
- 12:10 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
- 12:09 mszwarc@deploy2002: mszwarc: Continuing with sync
- 12:09 mszwarc@deploy2002: mszwarc: Backport for Enable autodemotion for 2FA-less CN admins and WMF T&S (T418580) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:07 mszwarc@deploy2002: Started scap sync-world: Backport for Enable autodemotion for 2FA-less CN admins and WMF T&S (T418580)
- 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
- 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
- 12:05 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 12:05 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-eqiad
- 12:04 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:03 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:02 ladsgroup@deploy2002: Finished scap sync-world: Backport for Make it follow thumb steps (T402792 T414805), DjvuHandler: Make it follow thumb steps (T402792 T414805 T416620 T418178), Make it follow thumb steps (T402792 T414805), DjvuHandler: Make it follow thumb steps (T402792 T414805 T416620 T418178) (duration: 06m 48s)
- 12:02 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1026.eqiad.wmnet with reason: host reimage
- 12:02 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
- 12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
- 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
- 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
- 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
- 11:59 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1026.eqiad.wmnet with reason: host reimage
- 11:58 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:58 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1012.eqiad.wmnet
- 11:57 ladsgroup@deploy2002: ladsgroup: Backport for Make it follow thumb steps (T402792 T414805), DjvuHandler: Make it follow thumb steps (T402792 T414805 T416620 T418178), Make it follow thumb steps (T402792 T414805), DjvuHandler: Make it follow thumb steps (T402792 T414805 T416620 T418178) synced to the testservers (see https://wikitech.wikimedia.
- 11:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 11:56 blake@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[1328-1372].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 11:56 cgoubert@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-codfw
- 11:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:55 ladsgroup@deploy2002: Started scap sync-world: Backport for Make it follow thumb steps (T402792 T414805), DjvuHandler: Make it follow thumb steps (T402792 T414805 T416620 T418178), Make it follow thumb steps (T402792 T414805), DjvuHandler: Make it follow thumb steps (T402792 T414805 T416620 T418178)
- 11:54 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:54 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 11:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
- 11:50 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 11:50 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Updating for dse-k8s-worker1012 - btullis@cumin1003"
- 11:49 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Updating for dse-k8s-worker1012 - btullis@cumin1003"
- 11:49 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1015.eqiad.wmnet with reason: host reimage
- 11:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
- 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
- 11:48 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
- 11:48 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1307.eqiad.wmnet
- 11:48 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1307.eqiad.wmnet
- 11:47 claime: sudo homer lsw1-e5-eqiad* commit 'wikikube-worker1307 to active'
- 11:47 cgoubert@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:46 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1015.eqiad.wmnet with reason: host reimage
- 11:46 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 11:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:44 cgoubert@cumin1003: START - Cookbook sre.dns.netbox
- 11:42 cgoubert@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
- 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
- 11:39 cgoubert@cumin1003: START - Cookbook sre.dns.netbox
- 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
- 11:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2091.codfw.wmnet
- 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
- 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
- 11:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
- 11:36 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-worker1347.eqiad.wmnet
- 11:34 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1020.eqiad.wmnet with OS bookworm
- 11:31 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 11:31 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2091.codfw.wmnet
- 11:30 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1092.eqiad.wmnet
- 11:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2090.codfw.wmnet
- 11:30 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-worker1347.eqiad.wmnet
- 11:30 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
- 11:30 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 11:30 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 11:29 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
- 11:29 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 11:28 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 11:28 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
- 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
- 11:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2090.codfw.wmnet
- 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2089.codfw.wmnet
- 11:23 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1092.eqiad.wmnet
- 11:23 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1091.eqiad.wmnet
- 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
- 11:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
- 11:20 btullis@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dse-k8s-worker1015
- 11:20 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1015
- 11:18 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1091.eqiad.wmnet
- 11:18 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1090.eqiad.wmnet
- 11:18 vgutierrez@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: no reason specified, no task ID specified]
- 11:18 vgutierrez@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: no reason specified, no task ID specified]
- 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
- 11:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2089.codfw.wmnet
- 11:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2088.codfw.wmnet
- 11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
- 11:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
- 11:13 vgutierrez@dns1004: END - running authdns-update
- 11:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
- 11:11 vgutierrez@dns1004: START - running authdns-update
- 11:11 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1090.eqiad.wmnet
- 11:11 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1089.eqiad.wmnet
- 11:10 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2088.codfw.wmnet
- 11:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2087.codfw.wmnet
- 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
- 11:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
- 11:07 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:05 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:05 btullis@cumin1003: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
- 11:04 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1089.eqiad.wmnet
- 11:04 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:04 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1088.eqiad.wmnet
- 11:03 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2087.codfw.wmnet
- 11:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2086.codfw.wmnet
- 11:03 vgutierrez@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
- 11:00 vgutierrez@cumin1003: START - Cookbook sre.dns.netbox
- 10:59 btullis@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-jumbo-eqiad
- 10:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
- 10:57 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1088.eqiad.wmnet
- 10:57 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1087.eqiad.wmnet
- 10:57 fabfur@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
- 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
- 10:56 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2086.codfw.wmnet
- 10:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2085.codfw.wmnet
- 10:56 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 10:53 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1087.eqiad.wmnet
- 10:53 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1086.eqiad.wmnet
- 10:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
- 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
- 10:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
- 10:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2085.codfw.wmnet
- 10:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
- 10:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2084.codfw.wmnet
- 10:46 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1086.eqiad.wmnet
- 10:46 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1085.eqiad.wmnet
- 10:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
- 10:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
- 10:40 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2084.codfw.wmnet
- 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
- 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2083.codfw.wmnet
- 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
- 10:39 fabfur@cumin1003: START - Cookbook sre.dns.netbox
- 10:39 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1085.eqiad.wmnet
- 10:39 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1084.eqiad.wmnet
- 10:37 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
- 10:34 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:34 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
- 10:32 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1084.eqiad.wmnet
- 10:32 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1083.eqiad.wmnet
- 10:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2083.codfw.wmnet
- 10:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
- 10:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2082.codfw.wmnet
- 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
- 10:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
- 10:32 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
- 10:31 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2002.codfw.wmnet
- 10:26 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-staging2002.codfw.wmnet
- 10:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
- 10:25 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1083.eqiad.wmnet
- 10:25 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1082.eqiad.wmnet
- 10:25 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2082.codfw.wmnet
- 10:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2081.codfw.wmnet
- 10:24 btullis@cumin1003: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
- 10:23 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2003.codfw.wmnet
- 10:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
- 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
- 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
- 10:19 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-staging2003.codfw.wmnet
- 10:18 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1082.eqiad.wmnet
- 10:17 vgutierrez@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: no reason specified, no task ID specified]
- 10:17 vgutierrez@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: no reason specified, no task ID specified]
- 10:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2081.codfw.wmnet
- 10:17 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
- 10:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
- 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
- 10:14 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
- 10:14 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
- 10:13 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1081.eqiad.wmnet
- 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
- 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
- 10:11 vgutierrez@dns1004: END - running authdns-update
- 10:10 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
- 10:10 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
- 10:09 vgutierrez@dns1004: START - running authdns-update
- 10:07 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2080.codfw.wmnet
- 10:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
- 10:06 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1081.eqiad.wmnet
- 10:06 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1080.eqiad.wmnet
- 10:05 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
- 10:05 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
- 10:04 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool ulsfo [reason: no reason specified, T418971]
- 10:04 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: no reason specified, T418971]
- 10:03 slyngshede@cumin1003: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: pool ulsfo [reason: no reason specified, no task ID specified]
- 10:03 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool ulsfo [reason: no reason specified, no task ID specified]
- 10:01 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
- 10:01 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 23 hosts
- 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
- 10:01 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
- 10:01 slyngshede@cumin1003: START - Cookbook sre.hosts.remove-downtime for 23 hosts
- 09:59 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1080.eqiad.wmnet
- 09:59 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1079.eqiad.wmnet
- 09:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2080.codfw.wmnet
- 09:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2079.codfw.wmnet
- 09:58 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
- 09:57 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
- 09:56 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
- 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
- 09:52 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
- 09:51 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2079.codfw.wmnet
- 09:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2078.codfw.wmnet
- 09:51 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1079.eqiad.wmnet
- 09:51 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
- 09:51 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1078.eqiad.wmnet
- 09:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1003.eqiad.wmnet
- 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
- 09:48 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
- 09:48 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
- 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1003.eqiad.wmnet
- 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
- 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
- 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
- 09:46 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
- 09:46 klausman@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
- 09:45 moritzm: installing postgresql-15 security updates
- 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restart A:lvs-secondary-ulsfo and A:liberica
- 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs4010.ulsfo.wmnet} and A:liberica
- 09:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2078.codfw.wmnet
- 09:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2077.codfw.wmnet
- 09:45 slyngshede@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P{lvs4010.ulsfo.wmnet} and A:liberica
- 09:45 slyngshede@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs4010.ulsfo.wmnet} and A:liberica
- 09:44 slyngshede@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P{lvs4010.ulsfo.wmnet} and A:liberica
- 09:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2003.codfw.wmnet
- 09:44 slyngshede@cumin1003: START - Cookbook sre.loadbalancer.upgrade restart A:lvs-secondary-ulsfo and A:liberica
- 09:44 jayme: switched wikikube staging apiservers to IPIP and maglev in eqiad and codfw - T352956
- 09:43 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1078.eqiad.wmnet
- 09:43 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1077.eqiad.wmnet
- 09:42 klausman@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
- 09:40 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: wikikube-staging-master-eqiad@eqiad
- 09:40 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 09:40 slyngshede@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading A:lvs-secondary-ulsfo and A:liberica (T418971)
- 09:40 slyngshede@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading A:lvs-secondary-ulsfo and A:liberica (T418971)
- 09:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2003.codfw.wmnet
- 09:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
- 09:39 jayme@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 09:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2077.codfw.wmnet
- 09:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2076.codfw.wmnet
- 09:37 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-codfw
- 09:37 jayme@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-staging-master-eqiad@eqiad
- 09:36 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1077.eqiad.wmnet
- 09:36 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1076.eqiad.wmnet
- 09:35 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: wikikube-staging-master-codfw@codfw
- 09:35 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 09:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
- 09:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2076.codfw.wmnet
- 09:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2075.codfw.wmnet
- 09:26 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1076.eqiad.wmnet
- 09:26 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
- 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
- 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
- 09:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2075.codfw.wmnet
- 09:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2074.codfw.wmnet
- 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
- 09:19 jayme@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 09:18 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
- 09:18 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
- 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
- 09:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
- 09:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2074.codfw.wmnet
- 09:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
- 09:14 jayme@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-staging-master-codfw@codfw
- 09:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
- 09:13 klausman@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-codfw
- 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
- 09:12 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-eqiad
- 09:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
- 09:10 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
- 09:10 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
- 09:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
- 09:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
- 09:08 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 23 hosts with reason: Update ULSFO LVS service IPs
- 09:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
- 09:03 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
- 09:03 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
- 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
- 09:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
- 09:02 slyngshede@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool ulsfo [reason: no reason specified, T418971]
- 09:02 slyngshede@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool ulsfo [reason: no reason specified, T418971]
- 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
- 09:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
- 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
- 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
- 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
- 08:56 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
- 08:56 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
- 08:55 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
- 08:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
- 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
- 08:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
- 08:48 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
- 08:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
- 08:46 klausman@cumin1003: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-eqiad
- 08:29 hashar: Restarting CI Jenkins for plugin upgrade # T420347
- 08:22 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.20 refs T413811
- 07:45 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
- 07:42 btullis@cumin1003: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
- 07:35 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
- 07:22 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 07:16 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 06:54 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 06:38 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 03:22 musikanimal@deploy2002: Finished scap sync-world: Backport for CM5: add more aggressive warnings about CM5 deprecation (T373720) (duration: 12m 22s)
- 03:18 musikanimal@deploy2002: musikanimal: Continuing with sync
- 03:11 musikanimal@deploy2002: musikanimal: Backport for CM5: add more aggressive warnings about CM5 deprecation (T373720) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 03:09 musikanimal@deploy2002: Started scap sync-world: Backport for CM5: add more aggressive warnings about CM5 deprecation (T373720)
- 02:08 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 07m 47s)
- 02:07 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 02:07 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 02:07 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 02:07 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 02:07 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 02:07 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 02:06 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 02:05 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 02:05 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 02:05 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 02:05 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 02:04 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:38 denisse@deploy2002: Finished deploy [librenms/librenms@9bdfb73]: Upgrade LibreNMS to 26.3.1 (duration: 00m 19s)
- 01:38 denisse@deploy2002: Started deploy [librenms/librenms@9bdfb73]: Upgrade LibreNMS to 26.3.1
- 01:10 denisse@deploy2002: Finished deploy [librenms/librenms@d152b36]: Upgrade LibreNMS to 25.11.0 (duration: 00m 08s)
- 01:10 denisse@deploy2002: Started deploy [librenms/librenms@d152b36]: Upgrade LibreNMS to 25.11.0
2026-03-17
- 23:44 btullis@cumin1003: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 23:38 btullis@cumin1003: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 22:55 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3081.*
- 22:20 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3073.esams.wmnet [reason: trixie reimaging]
- 22:15 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3073.esams.wmnet with OS trixie
- 22:11 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3072.esams.wmnet [reason: trixie reimaging]
- 22:10 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3072.esams.wmnet with OS trixie
- 22:05 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on releases1003.eqiad.wmnet with reason: T420246
- 22:05 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on releases2003.codfw.wmnet with reason: T420246
- 21:48 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3073.esams.wmnet with reason: host reimage
- 21:44 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
- 21:41 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3073.esams.wmnet with reason: host reimage
- 21:39 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3072.esams.wmnet with reason: host reimage
- 21:38 ryankemper: T411568 Failed back HDFS NameNode from an-master1004 to an-master1003; cluster back to original active/standby configuration
- 21:15 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3073.esams.wmnet with OS trixie
- 21:14 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3073.esams.wmnet [reason: trixie reimaging]
- 21:14 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3072.esams.wmnet with OS trixie
- 21:14 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3072.esams.wmnet [reason: trixie reimaging]
- 21:13 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3070.esams.wmnet [reason: trixie reimaging]
- 21:13 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3071.esams.wmnet [reason: trixie reimaging]
- 21:09 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3070.esams.wmnet with OS trixie
- 21:05 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3071.esams.wmnet with OS trixie
- 20:59 alexsanford@deploy2002: Finished scap sync-world: Backport for Remove notice from login form in popup mode (T418534) (duration: 07m 32s)
- 20:56 alexsanford@deploy2002: alexsanford: Continuing with sync
- 20:54 alexsanford@deploy2002: alexsanford: Backport for Remove notice from login form in popup mode (T418534) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:52 alexsanford@deploy2002: Started scap sync-world: Backport for Remove notice from login form in popup mode (T418534)
- 20:48 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:43 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:43 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3070.esams.wmnet with reason: host reimage
- 20:40 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 20:40 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 20:38 ryankemper: T411568 failed over HDFS NameNode from an-master1003 to an-master1004, then rebooted `an-master1003`
- 20:38 ryankemper: T411568 rebooted `an-coord1003`, `an-coord1004`, `an-tool1007`, `an-tool1008`, `an-tool1011`, `an-web1001`
- 20:38 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3071.esams.wmnet with reason: host reimage
- 20:34 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3070.esams.wmnet with reason: host reimage
- 20:34 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3071.esams.wmnet with reason: host reimage
- 20:31 catrope@deploy2002: Finished scap sync-world: Backport for Passwordless login: Don't display conditional auth errors, Passwordless login: Don't display conditional auth errors (duration: 08m 56s)
- 20:27 catrope@deploy2002: catrope: Continuing with sync
- 20:24 catrope@deploy2002: catrope: Backport for Passwordless login: Don't display conditional auth errors, Passwordless login: Don't display conditional auth errors synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:22 catrope@deploy2002: Started scap sync-world: Backport for Passwordless login: Don't display conditional auth errors, Passwordless login: Don't display conditional auth errors
- 20:16 ryankemper: T411568 rebooted `an-test-master1002`, `an-test-master1003`, `an-test-master1004`, `archiva1002`
- 20:12 aude@deploy2002: Finished scap sync-world: Backport for Set wgReadingListsBetaDefaultForNewAccountsAfter for beta cluster (T419163) (duration: 08m 53s)
- 20:09 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3071.esams.wmnet with OS trixie
- 20:09 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3070.esams.wmnet with OS trixie
- 20:08 aude@deploy2002: aude: Continuing with sync
- 20:08 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3070.esams.wmnet [reason: trixie reimaging]
- 20:08 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3068.esams.wmnet [reason: trixie reimaging]
- 20:07 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3069.esams.wmnet [reason: trixie reimaging]
- 20:06 aude@deploy2002: aude: Backport for Set wgReadingListsBetaDefaultForNewAccountsAfter for beta cluster (T419163) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:03 aude@deploy2002: Started scap sync-world: Backport for Set wgReadingListsBetaDefaultForNewAccountsAfter for beta cluster (T419163)
- 19:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3081.esams.wmnet with OS trixie
- 19:54 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3069.esams.wmnet with OS trixie
- 19:54 ryankemper: T411568 rebooted `an-test-client1002`, `an-test-ui1001`, `an-test-coord1001`, `an-test-master1001`
- 19:50 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3068.esams.wmnet with OS trixie
- 19:46 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 19:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2003.wikimedia.org with OS trixie
- 19:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3081.esams.wmnet with reason: host reimage
- 19:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3081.esams.wmnet with reason: host reimage
- 19:28 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on releases1003.eqiad.wmnet with reason: T420246
- 19:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3069.esams.wmnet with reason: host reimage
- 19:23 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3068.esams.wmnet with reason: host reimage
- 19:21 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3069.esams.wmnet with reason: host reimage
- 19:20 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3068.esams.wmnet with reason: host reimage
- 19:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
- 19:11 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
- 19:08 dzahn@dns1004: END - running authdns-update
- 19:07 dzahn@dns1004: START - running authdns-update
- 19:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS trixie
- 19:05 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS trixie
- 19:00 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3080.*
- 18:56 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3069.esams.wmnet with OS trixie
- 18:55 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3069.esams.wmnet [reason: trixie reimaging]
- 18:55 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3068.esams.wmnet with OS trixie
- 18:55 btullis@cumin1003: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
- 18:54 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3068.esams.wmnet [reason: trixie reimaging]
- 18:53 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet [reason: trixie reimaging]
- 18:53 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp3067.esams.wmnet [reason: trixie reimaging]
- 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host bast2003.wikimedia.org with OS trixie
- 18:49 swfrench-wmf: manually uncordoned wikikube-worker-exp1001.eqiad.wmnet after failed reboot
- 18:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3080.esams.wmnet with OS trixie
- 18:43 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3067.esams.wmnet with OS trixie
- 18:40 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp3066.esams.wmnet with OS trixie
- 18:32 dwisehaupt@dns1005: END - running authdns-update
- 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2003.wikimedia.org with OS bookworm
- 18:31 dwisehaupt@dns1005: START - running authdns-update
- 18:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 18:25 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 18:20 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3080.esams.wmnet with reason: host reimage
- 18:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp700[5-8].magru.wmnet} and A:cp
- 18:19 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7008.magru.wmnet
- 18:17 akhatun@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 18:16 akhatun@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 18:16 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3067.esams.wmnet with reason: host reimage
- 18:16 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 18:13 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp3066.esams.wmnet with reason: host reimage
- 18:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3080.esams.wmnet with reason: host reimage
- 18:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
- 18:04 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3067.esams.wmnet with reason: host reimage
- 18:03 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp3066.esams.wmnet with reason: host reimage
- 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
- 17:52 cgoubert@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on P{wikikube-worker[1312-1327].eqiad.wmnet,wikikube-worker-exp1001.eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 17:52 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 17:52 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 17:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3080.esams.wmnet with OS trixie
- 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 17:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:42 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS trixie
- 17:42 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp3081.esams.wmnet with OS trixie
- 17:42 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:41 btullis@cumin1003: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
- 17:40 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:39 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7013.magru.wmnet,cp701[5-6].magru.wmnet} and A:cp
- 17:39 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7016.magru.wmnet
- 17:38 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:37 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:37 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7007.magru.wmnet
- 17:37 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:34 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 17:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 17:33 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 17:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 17:32 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 17:31 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 17:29 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3067.esams.wmnet with OS trixie
- 17:29 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3067.esams.wmnet [reason: trixie reimaging]
- 17:28 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3066.esams.wmnet with OS trixie
- 17:28 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:27 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp3066.esams.wmnet with OS trixie
- 17:27 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp3066.esams.wmnet with OS trixie
- 17:26 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:26 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet [reason: trixie reimaging]
- 17:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp3081.esams.wmnet with OS trixie
- 17:19 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:16 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
- 17:16 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 17:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
- 17:15 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 17:14 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 17:14 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 17:14 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 17:13 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:13 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:10 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 17:09 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7014.*
- 17:09 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
- 17:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
- 17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 17:08 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
- 17:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
- 17:07 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 17:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 17:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host bast2003.wikimedia.org with OS bookworm
- 17:06 cgoubert@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[1312-1327].eqiad.wmnet,wikikube-worker-exp1001.eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['bast2003']
- 17:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
- 17:02 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
- 17:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
- 17:01 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
- 17:01 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 17:00 cgoubert@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 16:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7014.magru.wmnet with OS trixie
- 16:58 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 16:58 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 16:58 btullis@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
- 16:57 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7015.magru.wmnet
- 16:56 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7006.magru.wmnet
- 16:55 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
- 16:55 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
- 16:53 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
- 16:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
- 16:47 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 16:47 samtar@deploy2002: mwscript-k8s job started: foreachwikiindblist all cleanupWatchlistLabelMember.php # T420328
- 16:46 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 16:46 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['bast2003']
- 16:45 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
- 16:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
- 16:44 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
- 16:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
- 16:42 btullis@cumin1003: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
- 16:40 btullis@cumin1003: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
- 16:37 cgoubert@cumin1003: START - Cookbook sre.dns.netbox
- 16:36 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
- 16:36 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
- 16:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
- 16:35 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
- 16:34 samtar@deploy2002: mwscript-k8s job started: foreachwikiindblist group2 cleanupWatchlistLabelMember.php # T420328
- 16:33 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul2003.codfw.wmnet with OS trixie
- 16:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7014.magru.wmnet with reason: host reimage
- 16:32 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker[1306,1308-1311].eqiad.wmnet
- 16:32 cgoubert@cumin1003: START - Cookbook sre.hosts.remove-downtime for wikikube-worker[1306,1308-1311].eqiad.wmnet
- 16:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7014.magru.wmnet with reason: host reimage
- 16:28 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
- 16:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
- 16:25 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on releases2003.codfw.wmnet with reason: T420246
- 16:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
- 16:25 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
- 16:25 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1306,1308-1311].eqiad.wmnet
- 16:25 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1306,1308-1311].eqiad.wmnet
- 16:18 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
- 16:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
- 16:18 btullis@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-jumbo-eqiad
- 16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
- 16:17 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
- 16:15 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7013.magru.wmnet
- 16:15 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul2003.codfw.wmnet with reason: host reimage
- 16:14 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7005.magru.wmnet
- 16:10 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
- 16:08 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul2003.codfw.wmnet with reason: host reimage
- 16:07 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
- 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7014.magru.wmnet with OS trixie
- 16:05 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7014.magru.wmnet with OS trixie
- 16:03 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7013.magru.wmnet,cp701[5-6].magru.wmnet} and A:cp
- 16:03 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp700[5-8].magru.wmnet} and A:cp
- 15:54 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1009.eqiad.wmnet
- 15:54 mutante: zuul2003 - reimaging with trixie
- 15:52 samtar@deploy2002: mwscript-k8s job started: foreachwikiindblist group1 cleanupWatchlistLabelMember.php # T420328
- 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2033.codfw.wmnet
- 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2033.codfw.wmnet
- 15:46 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host zuul2003.codfw.wmnet with OS trixie
- 15:45 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1009.eqiad.wmnet
- 15:45 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1008.eqiad.wmnet
- 15:44 samtar@deploy2002: mwscript-k8s job started: foreachwikiindblist group0 cleanupWatchlistLabelMember.php # T420328
- 15:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2033.codfw.wmnet
- 15:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2033.codfw.wmnet
- 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2048.codfw.wmnet
- 15:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2048.codfw.wmnet
- 15:36 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1008.eqiad.wmnet
- 15:36 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1007.eqiad.wmnet
- 15:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2048.codfw.wmnet
- 15:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
- 15:33 samtar@deploy2002: mwscript-k8s job started: foreachwikiindblist testwikis cleanupWatchlistLabelMember.php # T420328
- 15:32 btullis@cumin1003: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
- 15:28 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1007.eqiad.wmnet
- 15:28 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1006.eqiad.wmnet
- 15:27 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
- 15:27 samtar@deploy2002: mwscript-k8s job started: cleanupWatchlistLabelMember.php --wiki=testwiki # T420328
- 15:27 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2008-dev.codfw.wmnet
- 15:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
- 15:23 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 15:22 btullis@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
- 15:21 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 15:20 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudnet2008-dev.codfw.wmnet
- 15:20 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1006.eqiad.wmnet
- 15:20 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1005.eqiad.wmnet
- 15:18 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 15:18 urbanecm@deploy2002: Finished scap sync-world: Backport for cleanup: Growth: Remove temporary GrowthMentorList overrides (T418518) (duration: 06m 32s)
- 15:16 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 15:16 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16509
- 15:14 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.codfw.wmnet
- 15:14 urbanecm@deploy2002: urbanecm: Continuing with sync
- 15:13 urbanecm@deploy2002: urbanecm: Backport for cleanup: Growth: Remove temporary GrowthMentorList overrides (T418518) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:13 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 15:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2048.codfw.wmnet
- 15:11 urbanecm@deploy2002: Started scap sync-world: Backport for cleanup: Growth: Remove temporary GrowthMentorList overrides (T418518)
- 15:10 brennen@deploy2002: Finished deploy [phabricator/deployment@e845707]: deploy phab1004 for T420366 (duration: 01m 02s)
- 15:09 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be1005.eqiad.wmnet
- 15:09 brennen@deploy2002: Started deploy [phabricator/deployment@e845707]: deploy phab1004 for T420366
- 15:09 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Create dblists for wikis where CheckUser and AbuseFilter are disabled (T420063 T420062) (duration: 06m 38s)
- 15:09 brennen@deploy2002: Finished deploy [phabricator/deployment@e845707]: deploy phab2002 for T420366 (duration: 00m 35s)
- 15:08 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.codfw.wmnet
- 15:08 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.codfw.wmnet
- 15:08 brennen@deploy2002: Started deploy [phabricator/deployment@e845707]: deploy phab2002 for T420366
- 15:08 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2009.codfw.wmnet
- 15:05 jelto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator deploy
- 15:05 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 15:05 jelto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deploy
- 15:04 dreamyjazz@deploy2002: dreamyjazz: Backport for Create dblists for wikis where CheckUser and AbuseFilter are disabled (T420063 T420062) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7014.magru.wmnet with OS trixie
- 15:03 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1004.eqiad.wmnet
- 15:02 dreamyjazz@deploy2002: Started scap sync-world: Backport for Create dblists for wikis where CheckUser and AbuseFilter are disabled (T420063 T420062)
- 15:02 topranks: reset BGP session to ssw1-d8-eiqad from lsw1-d4-eqiad T420180
- 15:02 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 15:02 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.codfw.wmnet
- 15:02 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit2003-dev.codfw.wmnet
- 15:00 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2009.codfw.wmnet
- 15:00 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2008.codfw.wmnet
- 14:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
- 14:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2034.codfw.wmnet
- 14:57 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host phab1004.eqiad.wmnet
- 14:55 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudrabbit2003-dev.codfw.wmnet
- 14:55 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit2002-dev.codfw.wmnet
- 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum4004.ulsfo.wmnet
- 14:53 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 14:53 jmm@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
- 14:52 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2008.codfw.wmnet
- 14:52 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2007.codfw.wmnet
- 14:51 topranks: stop accepting routes on ssw1-d8-eqiad from external peers (cr2-eqiad, other spines) T420351
- 14:51 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
- 14:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host durum4004.ulsfo.wmnet
- 14:50 topranks: stop announcing routes from ssw1-d8-eqiad to external peers (cr2-eqiad, other spines) T420351
- 14:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2034.codfw.wmnet
- 14:48 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudrabbit2002-dev.codfw.wmnet
- 14:48 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit2001-dev.codfw.wmnet
- 14:46 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2007.codfw.wmnet
- 14:46 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2006.codfw.wmnet
- 14:45 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
- 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 14:44 taavi: deploying cr firewall changes from https://gerrit.wikimedia.org/r/c/operations/homer/public/+/1254211
- 14:44 topranks: stop announcing "direct" routes to ssw1-d8-eqiad from cr2-eqiad T420351
- 14:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2034.codfw.wmnet
- 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 14:43 moritzm: failover Ganeti master in codfw to ganeti2047
- 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2050.codfw.wmnet
- 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2050.codfw.wmnet
- 14:41 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudrabbit2001-dev.codfw.wmnet
- 14:41 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2007-dev.codfw.wmnet
- 14:40 topranks: disabling EVPN IBGP peering from ssw1-d8-eqiad to ssw1-d1-eqiad to stop them reflecting routes T420351
- 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1006.eqiad.wmnet
- 14:39 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
- 14:38 inflatador: bking@requestctl remove `wdqs_highest_error_rate_ever_seen` requestctl rule as it is no longer needed
- 14:38 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2006.codfw.wmnet
- 14:37 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2005.codfw.wmnet
- 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2050.codfw.wmnet
- 14:35 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudnet2007-dev.codfw.wmnet
- 14:35 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2006-dev.codfw.wmnet
- 14:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1006.eqiad.wmnet
- 14:34 Daimona: Creating ce_event_goals DB table for the CampaignEvents extension in x1.testwiki, x1.test2wiki, x1.officewiki, and x1.wikishared # T411433
- 14:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2050.codfw.wmnet
- 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2049.codfw.wmnet
- 14:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2049.codfw.wmnet
- 14:31 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
- 14:30 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet
- 14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2049.codfw.wmnet
- 14:27 topranks: de-pref internet circuits landing on cr2-eqiad to shift traffic to cr1 T420351
- 14:27 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudnet2006-dev.codfw.wmnet
- 14:27 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet2005-dev.codfw.wmnet
- 14:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2049.codfw.wmnet
- 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2047.codfw.wmnet
- 14:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2047.codfw.wmnet
- 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
- 14:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
- 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2047.codfw.wmnet
- 14:19 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudnet2005-dev.codfw.wmnet
- 14:19 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2004-dev.codfw.wmnet
- 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org
- 14:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2047.codfw.wmnet
- 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2046.codfw.wmnet
- 14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2046.codfw.wmnet
- 14:13 topranks: disable VRRP on cr2-eqiad interfaces facing ssw1-d8-eqiad T420351
- 14:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org
- 14:11 moritzm: powercycling ganeti2046 (stuck on reboot)
- 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2004.codfw.wmnet
- 14:10 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudlb2004-dev.codfw.wmnet
- 14:10 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
- 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2004.codfw.wmnet
- 14:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2003.codfw.wmnet
- 14:05 topranks: setting cr1-eqiad as VRRP master for all vlans T420351
- 14:01 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
- 14:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin2003.codfw.wmnet
- 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2046.codfw.wmnet
- 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
- 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
- 13:57 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.20 refs T413811
- 13:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2046.codfw.wmnet
- 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2045.codfw.wmnet
- 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2045.codfw.wmnet
- 13:52 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
- 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
- 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2045.codfw.wmnet
- 13:45 esanders@deploy2002: Finished scap sync-world: Backport for TitleWidget: Prioritise namespace prefix over interwiki prefix (T420288), TitleWidget: Prioritise namespace prefix over interwiki prefix (T420288) (duration: 08m 10s)
- 13:44 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
- 13:42 esanders@deploy2002: esanders: Continuing with sync
- 13:39 esanders@deploy2002: esanders: Backport for TitleWidget: Prioritise namespace prefix over interwiki prefix (T420288), TitleWidget: Prioritise namespace prefix over interwiki prefix (T420288) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:38 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
- 13:38 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be2004.codfw.wmnet
- 13:37 esanders@deploy2002: Started scap sync-world: Backport for TitleWidget: Prioritise namespace prefix over interwiki prefix (T420288), TitleWidget: Prioritise namespace prefix over interwiki prefix (T420288)
- 13:35 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on logstash2023.codfw.wmnet with reason: ganeti reboot
- 13:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2045.codfw.wmnet
- 13:32 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host apus-be2004.codfw.wmnet
- 13:32 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2045.codfw.wmnet
- 13:32 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2002.codfw.wmnet
- 13:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
- 13:30 cscott@deploy2002: Finished scap sync-world: Backport for Turn on postprocessing cache for all Parsoid parses (T348255) (duration: 10m 31s)
- 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2045.codfw.wmnet
- 13:26 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be2002.codfw.wmnet
- 13:26 cscott@deploy2002: cscott: Continuing with sync
- 13:26 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2001.codfw.wmnet
- 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
- 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
- 13:22 cscott@deploy2002: cscott: Backport for Turn on postprocessing cache for all Parsoid parses (T348255) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2044.codfw.wmnet
- 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2044.codfw.wmnet
- 13:20 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be2001.codfw.wmnet
- 13:20 cgoubert@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on P{wikikube-worker13[00-47].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 13:20 cscott@deploy2002: Started scap sync-world: Backport for Turn on postprocessing cache for all Parsoid parses (T348255)
- 13:20 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 13:19 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 13:19 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{wikikube-worker[2280-2331].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 13:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
- 13:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
- 13:16 btullis@cumin1003: END (PASS) - Cookbook sre.presto.reboot-workers (exit_code=0) for Presto an-presto cluster: Reboot Presto nodes
- 13:15 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1003.eqiad.wmnet
- 13:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2044.codfw.wmnet
- 13:15 aklapper@deploy2002: Finished scap sync-world: Backport for Remove misplaced readonly from CategoryViewer::$query (T420315) (duration: 06m 31s)
- 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint1001.eqiad.wmnet
- 13:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint1001.eqiad.wmnet
- 13:11 aklapper@deploy2002: zabe, aklapper: Continuing with sync
- 13:11 aklapper@deploy2002: zabe, aklapper: Backport for Remove misplaced readonly from CategoryViewer::$query (T420315) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:10 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 13:10 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 13:10 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be1003.eqiad.wmnet
- 13:10 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 16509
- 13:09 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apus-be1004.eqiad.wmnet
- 13:09 aklapper@deploy2002: Started scap sync-world: Backport for Remove misplaced readonly from CategoryViewer::$query (T420315)
- 13:08 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 13:08 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint2001.codfw.wmnet
- 13:04 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host apus-be1004.eqiad.wmnet
- 13:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ldap-maint2001.codfw.wmnet
- 13:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2044.codfw.wmnet
- 13:02 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1002.eqiad.wmnet
- 13:01 moritzm: failover Ganeti masters in drmrs to ganeti6003/6004
- 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
- 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
- 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2043.codfw.wmnet
- 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2043.codfw.wmnet
- 12:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 214657
- 12:56 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 214657
- 12:56 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 56308
- 12:55 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-eqiad
- 12:55 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 56308
- 12:55 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28788
- 12:55 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be1002.eqiad.wmnet
- 12:55 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be1001.eqiad.wmnet
- 12:54 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 28788
- 12:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
- 12:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2043.codfw.wmnet
- 12:53 ayounsi@cumin1003: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 28788
- 12:53 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 28788
- 12:53 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:52 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9269
- 12:52 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1012
- 12:52 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:51 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 9269
- 12:51 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1012
- 12:51 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 12:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
- 12:51 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
- 12:50 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:48 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1015
- 12:48 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host moss-be1001.eqiad.wmnet
- 12:45 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1015
- 12:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
- 12:44 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 12:44 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 12:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2043.codfw.wmnet
- 12:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
- 12:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
- 12:40 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2042.codfw.wmnet
- 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet
- 12:38 moritzm: powercycling ganeti2042 (stuck on reboot)
- 12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
- 12:34 moritzm: powercycling ganeti2041 (stuck on reboot)
- 12:31 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 12:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
- 12:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1005.wikimedia.org
- 12:22 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet
- 12:20 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-cluster
- 12:20 Emperor: roll-reboot apus frontends (codfw) for March reboots
- 12:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1005.wikimedia.org
- 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2005.wikimedia.org
- 12:13 topranks: restart BGP announcements from ssw1-d1-eqiad following change T420180
- 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2042.codfw.wmnet
- 12:08 jayme@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[2280-2331].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 12:07 dbrant@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 12:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2005.wikimedia.org
- 12:06 dbrant@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 12:06 dbrant@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3004.wikimedia.org
- 12:06 dbrant@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 12:05 jayme@cumin1003: conftool action : set/pooled=yes; selector: service=docker-registry,name=(registry1005.eqiad.wmnet|registry2005.codfw.wmnet)
- 12:05 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2005.codfw.wmnet
- 12:05 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1005.eqiad.wmnet
- 12:04 dbrant@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3004.wikimedia.org
- 12:04 dbrant@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4003.wikimedia.org
- 12:03 topranks: reset BGP session to ssw1-d1-eiqad from lsw1-c7-eqiad T420180
- 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2041.codfw.wmnet
- 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
- 12:01 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry1005.eqiad.wmnet
- 12:01 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry2005.codfw.wmnet
- 12:00 jayme@cumin1003: conftool action : set/pooled=no; selector: service=docker-registry,name=(registry1005.eqiad.wmnet|registry2005.codfw.wmnet)
- 12:00 topranks: reset BGP session to ssw1-d1-eiqad from lsw1-c6-eqiad T420180
- 12:00 jayme@cumin1003: conftool action : set/pooled=yes; selector: service=docker-registry,name=(registry1004.eqiad.wmnet|registry2004.codfw.wmnet)
- 11:59 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry1004.eqiad.wmnet
- 11:59 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host registry2004.codfw.wmnet
- 11:59 topranks: reset BGP session to ssw1-d1-eiqad from lsw1-c4-eqiad T420180
- 11:58 topranks: reset BGP session to ssw1-d1-eiqad from lsw1-c3-eqiad T420180
- 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4003.wikimedia.org
- 11:56 topranks: reset BGP session to ssw1-d1-eiqad from lsw1-c2-eqiad T420180
- 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5003.wikimedia.org
- 11:55 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry1004.eqiad.wmnet
- 11:55 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host registry2004.codfw.wmnet
- 11:54 jayme@cumin1003: conftool action : set/pooled=no; selector: service=docker-registry,name=(registry1004.eqiad.wmnet|registry2004.codfw.wmnet)
- 11:54 topranks: reset BGP session to ssw1-d1-eiqad from lsw1-d3-eqiad T420180
- 11:53 topranks: reset BGP session to ssw1-d1-eiqad from lsw1-d1-eqiad T420180
- 11:52 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
- 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
- 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
- 11:49 btullis@cumin1003: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 11:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5003.wikimedia.org
- 11:48 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-eqiad
- 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6003.wikimedia.org
- 11:47 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 11:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6003.wikimedia.org
- 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
- 11:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
- 11:43 btullis@cumin1003: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
- 11:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
- 11:41 cgoubert@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker13[00-47].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 11:39 topranks: stop accepting external routes on ssw1-d1-eqiad from cr1-eqiad T420180
- 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7002.wikimedia.org
- 11:33 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-cluster
- 11:33 Emperor: roll-reboot apus frontends (eqiad) for March reboots
- 11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org
- 11:28 moritzm: failover Ganeti master in eqsin to ganeti5004
- 11:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2041.codfw.wmnet
- 11:24 topranks: reduce local-preference for BGP routes learnt from servers on cr1-eqiad T420180
- 11:22 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:18 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2040.codfw.wmnet
- 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
- 11:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
- 11:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
- 11:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
- 11:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
- 11:05 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 11:05 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:04 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 11:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
- 11:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2040.codfw.wmnet
- 11:01 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
- 11:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
- 11:00 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 10:59 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:58 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:58 topranks: prepend external BGP announcements from cr1-eqiad T420180
- 10:57 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:56 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:56 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2039.codfw.wmnet
- 10:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2039.codfw.wmnet
- 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
- 10:52 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-eqiad
- 10:51 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:49 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:49 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
- 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
- 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
- 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
- 10:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2039.codfw.wmnet
- 10:45 javiermonton@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 10:45 javiermonton@deploy2002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2038.codfw.wmnet
- 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
- 10:43 javiermonton@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 10:43 javiermonton@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 10:42 topranks: cease announcing routed networks from ssw1-d1-eqiad to cr1-eqiad in BGP T420180
- 10:41 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:41 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:40 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:39 javiermonton@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 10:39 javiermonton@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 10:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
- 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
- 10:37 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:33 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2004-dev.codfw.wmnet
- 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2038.codfw.wmnet
- 10:29 topranks: stop announcing directly connected routes to L3 switches from cr1-eqiad T420180
- 10:28 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:27 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudgw2004-dev.codfw.wmnet
- 10:27 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
- 10:26 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:25 topranks: disable EVPN IBGP peering between ssw1-d1-eqiad and ssw1-d8-eqiad T420180
- 10:24 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:21 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 10:21 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 10:20 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudgw2003-dev.codfw.wmnet
- 10:20 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:19 urbanecm: Delete `job/growthexperiments-listtaskcounts-29513771` from mw-cron (job stuck for more than a month)
- 10:19 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1003.eqiad.wmnet
- 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
- 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2037.codfw.wmnet
- 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
- 10:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1003.eqiad.wmnet
- 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2002.codfw.wmnet
- 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
- 10:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2002.codfw.wmnet
- 10:06 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2002.codfw.wmnet
- 10:05 topranks: disabling VRRP for et-1/0/5 sub-interfaces on cr1-eqiad T420180
- 10:03 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:03 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:02 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:01 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:01 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:01 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:00 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-wf2002.codfw.wmnet
- 10:00 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf2001.codfw.wmnet
- 09:59 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:59 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:59 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:59 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:58 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2003.codfw.wmnet
- 09:57 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:57 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 09:57 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 09:56 topranks: shift traffic from codfw to eqiad off Arelion CCT to Lumen
- 09:56 mvernon@cumin1003: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
- 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2003.codfw.wmnet
- 09:54 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-wf2001.codfw.wmnet
- 09:54 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1002.eqiad.wmnet
- 09:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2037.codfw.wmnet
- 09:53 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 09:52 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:50 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:47 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-wf1002.eqiad.wmnet
- 09:47 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-wf1001.eqiad.wmnet
- 09:42 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:42 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-wf1001.eqiad.wmnet
- 09:40 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2002.codfw.wmnet
- 09:38 moritzm: installing openssl bugfix updates on trixie hosts
- 09:31 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-misc2002.codfw.wmnet
- 09:31 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc2001.codfw.wmnet
- 09:25 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-misc2001.codfw.wmnet
- 09:25 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1002.eqiad.wmnet
- 09:21 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-eqiad
- 09:20 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-misc1002.eqiad.wmnet
- 09:20 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1001.eqiad.wmnet
- 09:15 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-misc1001.eqiad.wmnet
- 09:10 kharlan@deploy2002: Finished scap sync-world: Backport for Revert "hcaptcha: Enforce hCaptcha on API edits coming from the MobileFrontend" (T419125) (duration: 12m 36s)
- 09:06 topranks: increase VRRP priority on eqiad vlans on CR2 to shift active gateway to cr2-eqiad T420180
- 09:05 mvernon@cumin1003: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
- 09:03 kharlan@deploy2002: kharlan: Continuing with sync
- 09:02 kharlan@deploy2002: kharlan: Backport for Revert "hcaptcha: Enforce hCaptcha on API edits coming from the MobileFrontend" (T419125) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:58 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-canary
- 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
- 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
- 08:57 kharlan@deploy2002: Started scap sync-world: Backport for Revert "hcaptcha: Enforce hCaptcha on API edits coming from the MobileFrontend" (T419125)
- 08:57 moritzm: rebuilt the trixie d-i image for the 13.4 point release T420240
- 08:54 kharlan@deploy2002: Sync cancelled.
- 08:52 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-canary
- 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
- 08:49 kharlan@deploy2002: harroyo-wmf, kharlan: Backport for hcaptcha: Enforce hCaptcha on API edits coming from the MobileFrontend (T419125) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
- 08:44 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host bast2003.wikimedia.org
- 08:43 kharlan@deploy2002: Started scap sync-world: Backport for hcaptcha: Enforce hCaptcha on API edits coming from the MobileFrontend (T419125)
- 08:42 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:42 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:35 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.restart-gerrit (exit_code=0) Restarting Gerrit on gerrit2002
- 08:34 arnaudb@cumin1003: START - Cookbook sre.gerrit.restart-gerrit Restarting Gerrit on gerrit2002
- 08:34 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-misc1001.eqiad.wmnet
- 08:34 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint1002.wikimedia.org
- 08:32 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:28 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host mc-misc1001.eqiad.wmnet
- 08:27 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host contint1002.wikimedia.org
- 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
- 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
- 08:20 jiji@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host parsoidtest1001.eqiad.wmnet
- 08:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
- 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti3005.esams.wmnet to cluster esams03 and group B
- 08:14 moritzm: powercycling bast2003 (stuck on reboot)
- 08:14 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti3005.esams.wmnet to cluster esams03 and group B
- 08:14 jiji@cumin1003: START - Cookbook sre.hosts.reboot-single for host parsoidtest1001.eqiad.wmnet
- 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
- 08:09 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:08 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 08:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
- 07:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
- 07:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5004.wikimedia.org
- 07:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti3005.esams.wmnet with OS bookworm
- 07:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast5004.wikimedia.org
- 07:37 jiji@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1285-1289,1291-1299].eqiad.wmnet
- 07:37 jiji@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1285-1289,1291-1299].eqiad.wmnet
- 07:34 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.restart-gerrit (exit_code=0) Restarting Gerrit on gerrit2003
- 07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
- 07:32 arnaudb@cumin1003: START - Cookbook sre.gerrit.restart-gerrit Restarting Gerrit on gerrit2003
- 07:32 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2033.codfw.wmnet
- 07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2033.codfw.wmnet
- 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
- 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
- 07:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti3005.esams.wmnet with reason: host reimage
- 07:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti3005.esams.wmnet with reason: host reimage
- 07:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
- 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
- 07:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti3005.esams.wmnet with OS bookworm
- 06:08 kart_: Updated cxserver to 2026-03-16-071247-production (T420004)
- 06:07 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 06:06 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 06:05 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 06:04 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 05:58 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 05:58 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 04:41 dwisehaupt@dns1005: END - running authdns-update
- 04:39 dwisehaupt@dns1005: START - running authdns-update
- 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.46.0-wmf.17 (duration: 01m 17s)
- 03:43 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.46.0-wmf.20 refs T413811 (duration: 39m 34s)
- 03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.46.0-wmf.20 refs T413811
- 02:09 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 10s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:26 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp6009.*
- 00:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6009.drmrs.wmnet with OS trixie
- 00:07 jdlrobson@deploy2002: Finished scap sync-world: Backport for Enable languages in main menu on Russian Wikipedia (T419730) (duration: 06m 57s)
- 00:03 jdlrobson@deploy2002: jdlrobson: Continuing with sync
- 00:02 jdlrobson@deploy2002: jdlrobson: Backport for Enable languages in main menu on Russian Wikipedia (T419730) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:00 jdlrobson@deploy2002: Started scap sync-world: Backport for Enable languages in main menu on Russian Wikipedia (T419730)
2026-03-16
- 23:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
- 23:56 jdlrobson@deploy2002: Finished scap sync-world: Backport for Don't output language HTML when no languages present (T419730), Support duplication of languages in header and main menu (T419730) (duration: 06m 44s)
- 23:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6009.drmrs.wmnet with reason: host reimage
- 23:52 jdlrobson@deploy2002: jdlrobson: Continuing with sync
- 23:51 jdlrobson@deploy2002: jdlrobson: Backport for Don't output language HTML when no languages present (T419730), Support duplication of languages in header and main menu (T419730) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:50 jdlrobson@deploy2002: Started scap sync-world: Backport for Don't output language HTML when no languages present (T419730), Support duplication of languages in header and main menu (T419730)
- 23:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp6009.drmrs.wmnet with OS trixie
- 23:32 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp601(0|1).*
- 22:54 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp6008.drmrs.wmnet [reason: trixie reimaging]
- 22:49 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6008.drmrs.wmnet with OS trixie
- 22:37 jforrester@deploy2002: Finished scap sync-world: T411807 (duration: 11m 10s)
- 22:35 jforrester@deploy2002: jforrester: Continuing with sync
- 22:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6010.drmrs.wmnet with OS trixie
- 22:31 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp70[09-12].magru.wmnet} and A:cp
- 22:31 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7012.magru.wmnet
- 22:31 blake@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) pool for host wikikube-worker[1285-1289,1291-1299].eqiad.wmnet
- 22:30 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp700[1-4].magru.wmnet} and A:cp
- 22:30 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7004.magru.wmnet
- 22:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6011.drmrs.wmnet with OS trixie
- 22:28 jforrester@deploy2002: jforrester: T411807 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:27 jforrester@deploy2002: Started scap sync-world: T411807
- 22:27 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1285-1289,1291-1299].eqiad.wmnet
- 22:24 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
- 22:20 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
- 22:17 blake@cumin1003: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on P{wikikube-worker[1020-1327].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 22:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
- 22:05 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp6007.drmrs.wmnet [reason: trixie reimaging]
- 22:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
- 22:03 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6007.drmrs.wmnet with OS trixie
- 22:02 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS trixie
- 21:59 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6010.drmrs.wmnet with reason: host reimage
- 21:58 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6011.drmrs.wmnet with reason: host reimage
- 21:58 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp6008.drmrs.wmnet with OS trixie
- 21:52 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7011.magru.wmnet
- 21:49 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7003.magru.wmnet
- 21:42 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zuul1003.eqiad.wmnet with OS trixie
- 21:42 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
- 21:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp6010.drmrs.wmnet with OS trixie
- 21:40 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp6012.*
- 21:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6012.drmrs.wmnet with OS trixie
- 21:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp6011.drmrs.wmnet with OS trixie
- 21:38 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
- 21:36 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp6013.*
- 21:36 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6008.drmrs.wmnet with reason: host reimage
- 21:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6013.drmrs.wmnet with OS trixie
- 21:32 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6007.drmrs.wmnet with reason: host reimage
- 21:28 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zuul1003.eqiad.wmnet with reason: host reimage
- 21:22 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on zuul1003.eqiad.wmnet with reason: host reimage
- 21:19 Dreamy_Jazz: Evening UTC backport window done
- 21:18 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Disable CheckUser on closed wikis where no checks were ever made (T420062), Uninstall SecurePoll from closed wikis (T420062), DiscussionTools: Uninstall wikis closed before permalinks were deployed (T420052) (duration: 06m 10s)
- 21:17 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp6008.drmrs.wmnet with OS trixie
- 21:16 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp6008.drmrs.wmnet [reason: trixie reimaging]
- 21:15 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp6006.drmrs.wmnet [reason: trixie reimaging]
- 21:15 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6006.drmrs.wmnet with OS trixie
- 21:14 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 21:14 dreamyjazz@deploy2002: dreamyjazz: Backport for Disable CheckUser on closed wikis where no checks were ever made (T420062), Uninstall SecurePoll from closed wikis (T420062), DiscussionTools: Uninstall wikis closed before permalinks were deployed (T420052) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified the
- 21:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
- 21:12 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp6007.drmrs.wmnet with OS trixie
- 21:12 dreamyjazz@deploy2002: Started scap sync-world: Backport for Disable CheckUser on closed wikis where no checks were ever made (T420062), Uninstall SecurePoll from closed wikis (T420062), DiscussionTools: Uninstall wikis closed before permalinks were deployed (T420052)
- 21:12 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp6007.drmrs.wmnet [reason: trixie reimaging]
- 21:11 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp6005.drmrs.wmnet [reason: trixie reimaging]
- 21:10 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6005.drmrs.wmnet with OS trixie
- 21:10 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7010.magru.wmnet
- 21:10 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7002.magru.wmnet
- 21:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
- 21:08 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host zuul1003.eqiad.wmnet with OS trixie
- 21:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6012.drmrs.wmnet with reason: host reimage
- 21:06 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6013.drmrs.wmnet with reason: host reimage
- 21:05 catrope@deploy2002: Finished scap sync-world: Backport for Fix client credentials access tokens (T417278 T419921), Enable $wgTrackMediaRequestProvenance on testwikis and beta cluster (T414338), Configure $wgApiClientErrorSampleRate (T418957) (duration: 08m 06s)
- 21:01 catrope@deploy2002: matmarex, catrope: Continuing with sync
- 20:59 catrope@deploy2002: matmarex, catrope: Backport for Fix client credentials access tokens (T417278 T419921), Enable $wgTrackMediaRequestProvenance on testwikis and beta cluster (T414338), Configure $wgApiClientErrorSampleRate (T418957) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:57 catrope@deploy2002: Started scap sync-world: Backport for Fix client credentials access tokens (T417278 T419921), Enable $wgTrackMediaRequestProvenance on testwikis and beta cluster (T414338), Configure $wgApiClientErrorSampleRate (T418957)
- 20:54 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 20:54 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 20:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp[2027-2040].codfw.wmnet
- 20:50 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:50 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[2027-2040].codfw.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
- 20:50 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp[2027-2040].codfw.wmnet decommissioned, removing all IPs except the asset tag one - brett@cumin2002"
- 20:49 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
- 20:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp6012.drmrs.wmnet with OS trixie
- 20:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp6013.drmrs.wmnet with OS trixie
- 20:45 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp2042.codfw.wmnet with reason: Testing hosts - not for production
- 20:45 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
- 20:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2042.codfw.wmnet with OS trixie
- 20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon2007-dev.codfw.wmnet with OS bookworm
- 20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:44 kharlan@deploy2002: Finished scap sync-world: Backport for Configure external link aggregate usage on 12 wikis for top domains (T419837) (duration: 06m 59s)
- 20:43 brett@cumin2002: START - Cookbook sre.dns.netbox
- 20:41 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6006.drmrs.wmnet with reason: host reimage
- 20:41 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp2041.codfw.wmnet with reason: Testing hosts - not for production
- 20:41 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6005.drmrs.wmnet with reason: host reimage
- 20:40 kharlan@deploy2002: kharlan, mszwarc: Continuing with sync
- 20:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2041.codfw.wmnet with OS trixie
- 20:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:38 kharlan@deploy2002: kharlan, mszwarc: Backport for Configure external link aggregate usage on 12 wikis for top domains (T419837) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:37 kharlan@deploy2002: Started scap sync-world: Backport for Configure external link aggregate usage on 12 wikis for top domains (T419837)
- 20:34 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp6014.*
- 20:33 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 20:33 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 20:32 cscott@deploy2002: Finished scap sync-world: Backport for Fix double post-processing in legacy preview case (T419908) (duration: 06m 52s)
- 20:29 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7009.magru.wmnet
- 20:28 cscott@deploy2002: cscott: Continuing with sync
- 20:28 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp7001.magru.wmnet
- 20:27 cscott@deploy2002: cscott: Backport for Fix double post-processing in legacy preview case (T419908) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:26 cscott@deploy2002: Started scap sync-world: Backport for Fix double post-processing in legacy preview case (T419908)
- 20:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2042.codfw.wmnet with reason: host reimage
- 20:22 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp6006.drmrs.wmnet with OS trixie
- 20:21 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp6006.drmrs.wmnet [reason: trixie reimaging]
- 20:21 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp6004.drmrs.wmnet [reason: trixie reimaging]
- 20:21 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp6005.drmrs.wmnet with OS trixie
- 20:20 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6004.drmrs.wmnet with OS trixie
- 20:20 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp6005.drmrs.wmnet [reason: trixie reimaging]
- 20:19 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp6003.drmrs.wmnet [reason: trixie reimaging]
- 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon2007-dev.codfw.wmnet with reason: host reimage
- 20:19 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp70[09-12].magru.wmnet} and A:cp
- 20:18 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp700[1-4].magru.wmnet} and A:cp
- 20:17 catrope@deploy2002: Finished scap sync-world: Backport for Enable passwordless login in production (T419198), Instrument clicks on external links to selected domains (T419837) (duration: 06m 43s)
- 20:16 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 20:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2041.codfw.wmnet with reason: host reimage
- 20:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon2007-dev.codfw.wmnet with reason: host reimage
- 20:15 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6003.drmrs.wmnet with OS trixie
- 20:13 catrope@deploy2002: kharlan, catrope: Continuing with sync
- 20:12 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 20:12 catrope@deploy2002: kharlan, catrope: Backport for Enable passwordless login in production (T419198), Instrument clicks on external links to selected domains (T419837) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2042.codfw.wmnet with reason: host reimage
- 20:11 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2041.codfw.wmnet with reason: host reimage
- 20:10 catrope@deploy2002: Started scap sync-world: Backport for Enable passwordless login in production (T419198), Instrument clicks on external links to selected domains (T419837)
- 20:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6014.drmrs.wmnet with OS trixie
- 20:03 brett@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp[2027-2040].codfw.wmnet
- 20:01 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Uninstall GlobalBlocking from closed wikis (T420062) (duration: 08m 20s)
- 19:57 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 19:55 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
- 19:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephmon2007-dev.codfw.wmnet with OS bookworm
- 19:54 dreamyjazz@deploy2002: dreamyjazz: Backport for Uninstall GlobalBlocking from closed wikis (T420062) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:54 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2042.codfw.wmnet with OS trixie
- 19:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon2007-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:53 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2041.codfw.wmnet with OS trixie
- 19:52 dreamyjazz@deploy2002: Started scap sync-world: Backport for Uninstall GlobalBlocking from closed wikis (T420062)
- 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephmon2007-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:51 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Uninstall AbuseFilter from closed wikis with no AbuseFilter logs (T420063) (duration: 09m 26s)
- 19:51 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
- 19:47 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 19:47 mutante: releases2003 - rm rsync-srv-org-wikimedia-releases-releases2003.* - alerts flapping since server reboot - puppet code needs to be improved to ensure units are removed when primary server is switched (T420246)
- 19:47 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6004.drmrs.wmnet with reason: host reimage
- 19:46 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6003.drmrs.wmnet with reason: host reimage
- 19:44 dreamyjazz@deploy2002: dreamyjazz: Backport for Uninstall AbuseFilter from closed wikis with no AbuseFilter logs (T420063) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
- 19:42 dreamyjazz@deploy2002: Started scap sync-world: Backport for Uninstall AbuseFilter from closed wikis with no AbuseFilter logs (T420063)
- 19:41 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon2007-dev
- 19:41 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon2007-dev
- 19:40 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating cloudcephmon2007-dev in codfw - jhancock@cumin2002"
- 19:40 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6014.drmrs.wmnet with reason: host reimage
- 19:39 ladsgroup@deploy2002: Finished scap sync-world: Backport for Revert "Media: Use previous step for non-standard width between steps and original" (T419927) (duration: 07m 10s)
- 19:35 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 19:34 ladsgroup@deploy2002: ladsgroup: Backport for Revert "Media: Use previous step for non-standard width between steps and original" (T419927) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating cloudcephmon2007-dev in codfw - jhancock@cumin2002"
- 19:32 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "Media: Use previous step for non-standard width between steps and original" (T419927)
- 19:28 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp404[5-6].ulsfo.wmnet} and A:cp
- 19:28 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp4046.ulsfo.wmnet
- 19:27 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp6004.drmrs.wmnet with OS trixie
- 19:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 19:27 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp6004.drmrs.wmnet [reason: trixie reimaging]
- 19:27 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp6003.drmrs.wmnet with OS trixie
- 19:26 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp6003.drmrs.wmnet [reason: trixie reimaging]
- 19:25 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp6002.drmrs.wmnet [reason: trixie reimaging]
- 19:25 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp6001.drmrs.wmnet [reason: trixie reimaging]
- 19:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp6014.drmrs.wmnet with OS trixie
- 19:17 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6002.drmrs.wmnet with OS trixie
- 19:17 fabfur@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp2042.codfw.wmnet with reason: Testing hosts - not for production
- 19:16 fabfur@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on cp2041.codfw.wmnet with reason: Testing hosts - not for production
- 19:15 akhatun@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 19:15 akhatun@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 19:12 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6001.drmrs.wmnet with OS trixie
- 19:02 akhatun@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 19:02 akhatun@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 18:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp4046.ulsfo.wmnet} and A:cp
- 18:57 cdobbins@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp4046.ulsfo.wmnet
- 18:52 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
- 18:49 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp4045.ulsfo.wmnet
- 18:49 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
- 18:47 cdobbins@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp4046.ulsfo.wmnet} and A:cp
- 18:47 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6002.drmrs.wmnet with reason: host reimage
- 18:45 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6001.drmrs.wmnet with reason: host reimage
- 18:39 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp404[5-6].ulsfo.wmnet} and A:cp
- 18:38 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp6015.*
- 18:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp403[7-9].ulsfo.wmnet} and A:cp
- 18:38 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp4039.ulsfo.wmnet
- 18:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6015.drmrs.wmnet with OS trixie
- 18:27 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp6002.drmrs.wmnet with OS trixie
- 18:26 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp6002.drmrs.wmnet [reason: trixie reimaging]
- 18:26 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp6001.drmrs.wmnet with OS trixie
- 18:24 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp6001.drmrs.wmnet [reason: trixie reimaging]
- 18:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
- 17:59 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6015.drmrs.wmnet with reason: host reimage
- 17:58 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp4038.ulsfo.wmnet
- 17:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp6015.drmrs.wmnet with OS trixie
- 17:37 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp6016.*
- 17:32 fabfur@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2042.codfw.wmnet with OS trixie
- 17:18 brett@cumin2002: cookbooks.sre.cdn.roll-reboot finished rebooting cp4037.ulsfo.wmnet
- 17:08 fabfur@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2042.codfw.wmnet with reason: host reimage
- 17:06 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp403[7-9].ulsfo.wmnet} and A:cp
- 17:03 fabfur@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2042.codfw.wmnet with reason: host reimage
- 17:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp6016.drmrs.wmnet with OS trixie
- 16:57 mutante: contint2002 - rebooting
- 16:47 mutante: phab2002 - rebooting
- 16:45 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:44 mszwarc@deploy2002: Finished scap sync-world: Backport for Add APCOND_OATH_HAS2FA to UserRequirementsPrivateConditions (duration: 06m 15s)
- 16:42 mutante: rebooting backends of releases.wikimedia.org
- 16:42 fabfur@cumin1003: START - Cookbook sre.hosts.reimage for host cp2042.codfw.wmnet with OS trixie
- 16:41 fabfur: reimage cp2042 for HAProxy testing (T419825)
- 16:41 mszwarc@deploy2002: mszwarc: Continuing with sync
- 16:40 mszwarc@deploy2002: mszwarc: Backport for Add APCOND_OATH_HAS2FA to UserRequirementsPrivateConditions synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:39 fabfur@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2041.codfw.wmnet with OS trixie
- 16:38 mszwarc@deploy2002: Started scap sync-world: Backport for Add APCOND_OATH_HAS2FA to UserRequirementsPrivateConditions
- 16:37 blake@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[1020-1327].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 16:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
- 16:32 milimetric: my bad, accidentally merged https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1250249, will read docs on config deployment better
- 16:31 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:29 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1012
- 16:27 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1012
- 16:27 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp6016.drmrs.wmnet with reason: host reimage
- 16:20 ladsgroup@deploy2002: Finished scap sync-world: Backport for Revert "mediawiki.util: Prefer prev step over non-standard in adjustThumbWidthForSteps" (T419927) (duration: 07m 28s)
- 16:17 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode2001.codfw.wmnet
- 16:16 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 16:14 ladsgroup@deploy2002: ladsgroup: Backport for Revert "mediawiki.util: Prefer prev step over non-standard in adjustThumbWidthForSteps" (T419927) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:13 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2005.codfw.wmnet
- 16:12 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "mediawiki.util: Prefer prev step over non-standard in adjustThumbWidthForSteps" (T419927)
- 16:12 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode2001.codfw.wmnet
- 16:11 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dragonfly-supernode1001.eqiad.wmnet
- 16:11 jayme@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=codfw
- 16:11 fabfur@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2041.codfw.wmnet with reason: host reimage
- 16:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum2001.codfw.wmnet
- 16:09 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1024.eqiad.wmnet
- 16:09 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1024.eqiad.wmnet
- 16:09 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1024.eqiad.wmnet
- 16:07 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1004-1007,1011-1012,1015-1016,1019-1021,1029-1031,1034-1168,1240-1289,1291-1327].eqiad.wmnet
- 16:06 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1004-1007,1011-1012,1015-1016,1019-1021,1029-1031,1034-1168,1240-1289,1291-1327].eqiad.wmnet
- 16:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp6016.drmrs.wmnet with OS trixie
- 16:06 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2005.codfw.wmnet
- 16:06 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host dragonfly-supernode1001.eqiad.wmnet
- 16:05 dwisehaupt@dns1006: END - running authdns-update
- 16:05 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host chartmuseum2001.codfw.wmnet
- 16:05 fabfur@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2041.codfw.wmnet with reason: host reimage
- 16:04 jayme@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=codfw
- 16:04 dwisehaupt@dns1006: START - running authdns-update
- 16:04 jayme@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=helm-charts,name=eqiad
- 16:00 blake@cumin1003: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on P{wikikube-worker[1004-1327].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 15:59 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host chartmuseum1001.eqiad.wmnet
- 15:59 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2031.codfw.wmnet
- 15:59 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2031.codfw.wmnet
- 15:54 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2004.codfw.wmnet
- 15:53 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host chartmuseum1001.eqiad.wmnet
- 15:52 jayme@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=helm-charts,name=eqiad
- 15:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncmonitor1001.eqiad.wmnet
- 15:47 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2004.codfw.wmnet
- 15:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
- 15:47 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncmonitor1001.eqiad.wmnet
- 15:46 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1024.eqiad.wmnet with reason: Rebooting clouddb1024 T419960
- 15:44 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1024.eqiad.wmnet
- 15:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
- 15:43 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1023.eqiad.wmnet
- 15:43 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1023.eqiad.wmnet
- 15:43 fabfur@cumin1003: START - Cookbook sre.hosts.reimage for host cp2041.codfw.wmnet with OS trixie
- 15:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
- 15:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
- 15:42 fabfur: reimage cp2041 for HAProxy testing (T419825)
- 15:42 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 15:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 15:41 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet
- 15:39 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 15:38 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 15:38 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
- 15:38 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 15:38 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 15:37 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1023.eqiad.wmnet with reason: Rebooting clouddb1023 T419960
- 15:35 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1022.eqiad.wmnet
- 15:35 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1022.eqiad.wmnet
- 15:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
- 15:32 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2003.codfw.wmnet
- 15:32 dwisehaupt@dns1006: END - running authdns-update
- 15:32 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet
- 15:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
- 15:31 dwisehaupt@dns1006: START - running authdns-update
- 15:27 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe-codfw
- 15:26 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
- 15:26 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2029.codfw.wmnet
- 15:26 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2029.codfw.wmnet
- 15:26 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 15:26 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 15:24 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 15:24 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 15:22 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2002.codfw.wmnet
- 15:21 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1023.eqiad.wmnet with reason: Rebooting clouddb1023 T419960
- 15:20 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl2001.codfw.wmnet
- 15:20 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "mediawiki.util: Prefer prev step over non-standard in adjustThumbWidthForSteps" (T419927)
- 15:16 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1022.eqiad.wmnet with reason: Rebooting clouddb1022 T419960
- 15:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
- 15:11 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum and A:durum
- 15:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
- 15:04 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl2001.codfw.wmnet
- 15:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:02 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough
- 15:01 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1004.eqiad.wmnet
- 15:01 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
- 15:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
- 14:56 ladsgroup@deploy2002: ladsgroup: Backport for Revert "mediawiki.util: Prefer prev step over non-standard in adjustThumbWidthForSteps" (T419927) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:54 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "mediawiki.util: Prefer prev step over non-standard in adjustThumbWidthForSteps" (T419927)
- 14:53 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudgw1004.eqiad.wmnet
- 14:53 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-gutter-codfw
- 14:52 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
- 14:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
- 14:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:50 mvernon@cumin1003: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe-eqiad
- 14:34 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1003.eqiad.wmnet with OS trixie
- 14:31 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:31 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:30 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:26 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:22 blake@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[1002-1327].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 14:22 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1002-1003].eqiad.wmnet
- 14:22 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1002-1003].eqiad.wmnet
- 14:21 blake@cumin1003: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on P{wikikube-worker[1002-1327].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 14:21 blake@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{wikikube-worker[1002-1327].eqiad.wmnet} and (A:wikikube-master-eqiad or A:wikikube-worker-eqiad)
- 14:20 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:20 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:20 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:18 sgimeno@deploy2002: Finished scap sync-world: Backport for fix(anon warning): remove wring type=signup param (T415160), AccountCreation: track account registrations for WE1.8 experiments (T416100) (duration: 09m 16s)
- 14:17 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:17 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:14 sgimeno@deploy2002: sgimeno: Continuing with sync
- 14:13 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:13 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:11 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:11 sgimeno@deploy2002: sgimeno: Backport for fix(anon warning): remove wring type=signup param (T415160), AccountCreation: track account registrations for WE1.8 experiments (T416100) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:11 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 14:10 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1003.eqiad.wmnet with reason: host reimage
- 14:09 sgimeno@deploy2002: Started scap sync-world: Backport for fix(anon warning): remove wring type=signup param (T415160), AccountCreation: track account registrations for WE1.8 experiments (T416100)
- 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2026.codfw.wmnet
- 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
- 14:08 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:04 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: testing
- 14:03 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1003.eqiad.wmnet with reason: host reimage
- 14:02 arnaudb@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on gerrit2002.wikimedia.org with reason: T418256
- 14:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1003.eqiad.wmnet
- 13:58 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 13:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
- 13:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1003.eqiad.wmnet
- 13:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
- 13:45 mszwarc@deploy2002: Finished scap sync-world: Backport for bowiki: update logos (T419268) (duration: 06m 17s)
- 13:45 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw1003.eqiad.wmnet with OS trixie
- 13:43 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-gutter-codfw
- 13:43 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-gutter-eqiad
- 13:41 mszwarc@deploy2002: mszwarc, anzx: Continuing with sync
- 13:41 mszwarc@deploy2002: mszwarc, anzx: Backport for bowiki: update logos (T419268) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
- 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2004.codfw.wmnet
- 13:39 mszwarc@deploy2002: Started scap sync-world: Backport for bowiki: update logos (T419268)
- 13:38 mszwarc@deploy2002: Finished scap sync-world: Backport for Always use external actor for interwiki rights logs on target wiki (T6055) (duration: 08m 53s)
- 13:34 mszwarc@deploy2002: mszwarc: Continuing with sync
- 13:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2004.codfw.wmnet
- 13:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2026.codfw.wmnet
- 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
- 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
- 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
- 13:31 mszwarc@deploy2002: mszwarc: Backport for Always use external actor for interwiki rights logs on target wiki (T6055) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:30 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-gutter-eqiad
- 13:29 mszwarc@deploy2002: Started scap sync-world: Backport for Always use external actor for interwiki rights logs on target wiki (T6055)
- 13:28 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-gutter-eqiad
- 13:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
- 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3004.esams.wmnet
- 13:25 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough
- 13:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
- 13:22 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum and A:durum
- 13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
- 13:21 XioNoX: drain edgeuno transit for optic replacement - T415743
- 13:19 cgoubert@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host wikikube-ctrl1004.eqiad.wmnet
- 13:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3004.esams.wmnet
- 13:14 jforrester@deploy2002: Finished scap sync-world: Backport for Replace direct BagOStuff with WANObjectCache (T419666) (duration: 11m 25s)
- 13:11 bwojtowicz@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4003.ulsfo.wmnet
- 13:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti3005.esams.wmnet
- 13:09 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host ganeti3005.esams.wmnet
- 13:07 jforrester@deploy2002: jforrester: Continuing with sync
- 13:06 jforrester@deploy2002: jforrester: Backport for Replace direct BagOStuff with WANObjectCache (T419666) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:05 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1004.eqiad.wmnet
- 13:04 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-gutter-eqiad
- 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir4002.ulsfo.wmnet
- 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir4002.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:03 jiji@cumin1003: END (ERROR) - Cookbook sre.memcached.roll-reboot-restart (exit_code=97) rolling reboot on A:memcached-gutter-eqiad
- 13:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir4002.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:03 jforrester@deploy2002: Started scap sync-world: Backport for Replace direct BagOStuff with WANObjectCache (T419666)
- 13:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4003.ulsfo.wmnet
- 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
- 12:51 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet
- 12:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
- 12:48 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-gutter-eqiad
- 12:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
- 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
- 12:44 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
- 12:42 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1003.eqiad.wmnet
- 12:41 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet
- 12:40 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 12:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
- 12:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ncredir4002.ulsfo.wmnet
- 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir4001.ulsfo.wmnet
- 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir4001.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:35 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 12:34 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 12:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow7002.magru.wmnet
- 12:32 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 12:32 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 12:32 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 12:28 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1017
- 12:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir4001.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow7002.magru.wmnet
- 12:27 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1017
- 12:25 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 12:25 cgoubert@cumin1003: START - Cookbook sre.hosts.reboot-single for host wikikube-ctrl1002.eqiad.wmnet
- 12:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:20 moritzm: failover Ganeti master in esams to ganeti3008
- 12:20 moritzm: failover Ganeti master in esams to ganeti3005
- 12:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:14 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ncredir4001.ulsfo.wmnet
- 12:00 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti3006.esams.wmnet
- 12:00 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti3006.esams.wmnet
- 11:57 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for druid[1009-1013].eqiad.wmnet
- 11:57 btullis@cumin1003: START - Cookbook sre.hosts.remove-downtime for druid[1009-1013].eqiad.wmnet
- 11:57 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.remove-downtime (exit_code=97) for druid[1009-1013].eqiad.wmnet
- 11:57 btullis@cumin1003: START - Cookbook sre.hosts.remove-downtime for druid[1009-1013].eqiad.wmnet
- 11:46 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1009.eqiad.wmnet with OS bookworm
- 11:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
- 11:43 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1010.eqiad.wmnet with OS bookworm
- 11:37 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1011.eqiad.wmnet with OS bookworm
- 11:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1012.eqiad.wmnet with OS bookworm
- 11:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
- 11:29 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host druid1013.eqiad.wmnet with OS bookworm
- 11:24 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:24 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:22 btullis@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on dse-k8s-worker[1012,1015-1017].eqiad.wmnet with reason: Adding 10 Gbps NIC
- 11:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
- 11:15 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
- 11:15 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:14 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:12 mvernon@cumin1003: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe-eqiad
- 11:12 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe-codfw
- 11:12 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
- 11:10 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:10 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:09 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1012.eqiad.wmnet with reason: host reimage
- 11:07 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2003.wikimedia.org
- 11:06 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on druid1013.eqiad.wmnet with reason: host reimage
- 11:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 11:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 11:04 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1010.eqiad.wmnet with reason: host reimage
- 11:02 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1011.eqiad.wmnet with reason: host reimage
- 11:02 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1009.eqiad.wmnet with reason: host reimage
- 11:01 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1012.eqiad.wmnet with reason: host reimage
- 11:00 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host gerrit2003.wikimedia.org
- 10:57 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on druid1013.eqiad.wmnet with reason: host reimage
- 10:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2010.codfw.wmnet
- 10:47 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host druid1013.eqiad.wmnet with OS bookworm
- 10:46 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host druid1012.eqiad.wmnet with OS bookworm
- 10:46 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host druid1011.eqiad.wmnet with OS bookworm
- 10:46 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host druid1010.eqiad.wmnet with OS bookworm
- 10:46 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host druid1009.eqiad.wmnet with OS bookworm
- 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3007.esams.wmnet
- 10:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2010.codfw.wmnet
- 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3007.esams.wmnet
- 10:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
- 10:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
- 10:28 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:28 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
- 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
- 10:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2009.codfw.wmnet
- 10:24 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:24 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:23 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:20 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2009.codfw.wmnet
- 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
- 10:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
- 10:09 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:08 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2004.codfw.wmnet
- 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
- 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
- 10:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2004.codfw.wmnet
- 09:56 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
- 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts tcp-proxy4002.ulsfo.wmnet
- 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: tcp-proxy4002.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 09:54 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 09:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: tcp-proxy4002.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 09:51 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 09:51 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
- 09:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:46 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts tcp-proxy4002.ulsfo.wmnet
- 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: decom tcp-proxy4001 - jmm@cumin2002"
- 09:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: decom tcp-proxy4001 - jmm@cumin2002"
- 09:43 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm2001.wikimedia.org
- 09:39 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM idm2001.wikimedia.org
- 09:38 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:38 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 09:38 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 09:36 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:35 slyngshede@dns1004: END - running authdns-update
- 09:34 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 09:34 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 09:34 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 09:33 slyngshede@dns1004: START - running authdns-update
- 09:32 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:30 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm1001.wikimedia.org
- 09:26 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 09:26 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM idm1001.wikimedia.org
- 09:24 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idm-test1001.wikimedia.org
- 09:22 moritzm: failover Ganeti master in magru to ganeti7004
- 09:21 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts tcp-proxy4001.ulsfo.wmnet
- 09:21 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-test-eqiad
- 09:20 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM idm-test1001.wikimedia.org
- 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
- 09:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:15 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM cloudidp2001-dev.codfw.wmnet
- 09:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
- 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7002.magru.wmnet
- 09:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
- 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
- 09:13 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts tcp-proxy4001.ulsfo.wmnet
- 09:11 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM cloudidp2001-dev.codfw.wmnet
- 09:09 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp2005.wikimedia.org
- 09:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
- 09:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
- 09:05 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM idp2005.wikimedia.org
- 09:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet
- 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
- 08:59 slyngshede@dns1004: END - running authdns-update
- 08:58 slyngshede@dns1004: START - running authdns-update
- 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
- 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
- 08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
- 08:49 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 08:48 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp1005.wikimedia.org
- 08:48 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:48 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
- 08:48 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 08:47 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 08:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
- 08:44 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM idp1005.wikimedia.org
- 08:44 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-test-eqiad
- 08:44 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 08:39 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test1005.wikimedia.org
- 08:35 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM idp-test1005.wikimedia.org
- 08:33 slyngshede@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM idp-test2005.wikimedia.org
- 08:29 slyngshede@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM idp-test2005.wikimedia.org
- 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
- 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
- 08:22 taavi@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s3
- 08:18 kgraessle@deploy2002: Finished scap sync-world: Backport for Fix broken survey links on PersonalDashboard (T419950) (duration: 32m 09s)
- 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
- 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
- 08:06 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
- 08:05 kgraessle@deploy2002: kgraessle: Continuing with sync
- 08:04 kgraessle@deploy2002: kgraessle: Backport for Fix broken survey links on PersonalDashboard (T419950) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:59 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
- 07:52 moritzm: installing Linux 5.10.251 on Bullseye hosts
- 07:45 kgraessle@deploy2002: Started scap sync-world: Backport for Fix broken survey links on PersonalDashboard (T419950)
- 07:37 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stewards1001.eqiad.wmnet
- 07:33 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host stewards1001.eqiad.wmnet
- 07:33 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
- 07:26 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
- 07:25 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict1002.eqiad.wmnet
- 07:21 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host aphlict1002.eqiad.wmnet
- 07:10 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc2003.codfw.wmnet
- 07:06 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host doc2003.codfw.wmnet
- 07:02 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists1004.wikimedia.org
- 06:55 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host lists1004.wikimedia.org
- 05:25 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 02:08 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 07m 52s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-03-15
- 02:08 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 07m 52s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-03-14
- 14:16 reedy@deploy2002: Finished scap sync-world: Backport for CommonSettings: Set class in $wgCentralAuthRC (duration: 06m 17s)
- 14:12 reedy@deploy2002: reedy: Continuing with sync
- 14:11 reedy@deploy2002: reedy: Backport for CommonSettings: Set class in $wgCentralAuthRC synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:10 reedy@deploy2002: Started scap sync-world: Backport for CommonSettings: Set class in $wgCentralAuthRC
- 12:51 reedy@deploy2002: Finished scap sync-world: Backport for CommonSettings: Specify class in IRC RCFeed setup (duration: 06m 19s)
- 12:47 reedy@deploy2002: reedy, lcawte: Continuing with sync
- 12:46 reedy@deploy2002: reedy, lcawte: Backport for CommonSettings: Specify class in IRC RCFeed setup synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:44 reedy@deploy2002: Started scap sync-world: Backport for CommonSettings: Specify class in IRC RCFeed setup
- 02:08 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 00s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-03-13
- 22:52 taavi: taavi@deploy2002 ~ $ mwscript CentralAuth:attachAccount.php --wiki=metawiki --userlist backfiller.txt # unify unified Special:CentralAuth/MediaWikiAccountBackfiller on meta
- 20:07 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.codfw.wmnet
- 20:01 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.codfw.wmnet
- 20:01 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.codfw.wmnet
- 19:55 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4052.*
- 19:54 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.codfw.wmnet
- 19:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS trixie
- 19:53 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2005-dev.codfw.wmnet
- 19:46 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.codfw.wmnet
- 19:46 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices2004-dev.codfw.wmnet
- 19:40 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4050.*
- 19:40 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.codfw.wmnet
- 19:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
- 19:24 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
- 19:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4050.ulsfo.wmnet
- 19:19 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1035.eqiad.wmnet with OS trixie
- 19:19 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1034.eqiad.wmnet with OS trixie
- 19:18 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 19:18 jclark@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 19:18 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 19:16 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4051.*
- 19:15 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4050.ulsfo.wmnet
- 19:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4050.ulsfo.wmnet
- 19:13 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 19:11 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4051.ulsfo.wmnet with OS trixie
- 19:07 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4050.ulsfo.wmnet
- 19:02 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1035.eqiad.wmnet with reason: host reimage
- 19:00 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1374.eqiad.wmnet with OS bookworm
- 19:00 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 19:00 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 18:58 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1034.eqiad.wmnet with reason: host reimage
- 18:58 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie
- 18:57 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS trixie
- 18:55 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1035.eqiad.wmnet with reason: host reimage
- 18:55 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1034.eqiad.wmnet with reason: host reimage
- 18:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
- 18:43 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage
- 18:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
- 18:40 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host wdqs1035.eqiad.wmnet with OS trixie
- 18:40 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host wdqs1034.eqiad.wmnet with OS trixie
- 18:40 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host wdqs1033.eqiad.wmnet with OS trixie
- 18:36 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1374.eqiad.wmnet with reason: host reimage
- 18:35 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp4050.ulsfo.wmnet with reason: firmware updates
- 18:34 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS trixie
- 18:24 brett@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4050.ulsfo.wmnet
- 18:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS trixie
- 18:22 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1374.eqiad.wmnet with OS bookworm
- 18:21 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1374.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 18:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS trixie
- 18:21 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4051.ulsfo.wmnet with OS trixie
- 18:12 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1373.eqiad.wmnet with OS bookworm
- 18:10 jclark@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-worker1374.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 18:10 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:10 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 18:10 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt - jclark@cumin1003"
- 18:10 elukey@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1253.eqiad.wmnet with reason: Host went down and paged, depooled
- 18:06 cgoubert@cumin1003: dbctl commit (dc=all): 'Depool db1253', diff saved to https://phabricator.wikimedia.org/P89856 and previous config saved to /var/cache/conftool/dbconfig/20260313-180640-cgoubert.json
- 18:06 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 18:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS trixie
- 18:03 elukey: powercycle db1253 - host not reachable via ssh, no events logged in racadm getsel, no console com2 available (blank screen)
- 17:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
- 17:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
- 17:49 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4049.*
- 17:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS trixie
- 17:37 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 17:37 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 17:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie
- 17:35 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4050.ulsfo.wmnet with OS trixie
- 17:35 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 17:34 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 17:27 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 17:26 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 17:26 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 17:26 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 17:20 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
- 17:17 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS trixie
- 17:17 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 17:16 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 17:16 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
- 17:12 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1016.eqiad.wmnet
- 17:12 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1016.eqiad.wmnet
- 17:11 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet
- 17:11 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4048.*
- 17:10 dhinus: (relogging failed sal) conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet
- 17:10 dhinus: (relogging failed sal) DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1016.eqiad.wmnet with reason: Rebooting clouddb1016 T419960
- 17:09 dhinus: (relogging failed sal) END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1015.eqiad.wmnet
- 17:08 dhinus: (relogging failed sal) START - Cookbook sre.hosts.remove-downtime for clouddb1015.eqiad.wmnet
- 17:08 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 17:07 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 17:07 dhinus: fnegri@cumin1003 conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet
- 17:07 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 17:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS trixie
- 17:06 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 16:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS trixie
- 16:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4048.ulsfo.wmnet with reason: host reimage
- 16:36 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet
- 16:35 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1015.eqiad.wmnet with reason: Rebooting clouddb1015 T419960
- 16:34 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1014.eqiad.wmnet
- 16:34 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1014.eqiad.wmnet
- 16:34 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet
- 16:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt-staging2001.codfw.wmnet
- 16:28 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
- 16:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host apt-staging2001.codfw.wmnet
- 16:22 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
- 16:21 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org
- 16:20 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1014.eqiad.wmnet with reason: Rebooting clouddb1014 T419960
- 16:20 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet
- 16:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor-dev2001.codfw.wmnet
- 16:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie
- 16:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4048.ulsfo.wmnet with OS trixie
- 16:16 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudweb1004.wikimedia.org
- 16:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor-dev2001.codfw.wmnet
- 16:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2003.codfw.wmnet
- 16:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
- 16:00 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 16:00 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 16:00 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS trixie
- 15:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet
- 15:38 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.check-ipip (exit_code=0)
- 15:38 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 15:37 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 15:37 vgutierrez@cumin1003: END (FAIL) - Cookbook sre.loadbalancer.check-ipip (exit_code=99)
- 15:37 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 15:36 vgutierrez@cumin1003: END (FAIL) - Cookbook sre.loadbalancer.check-ipip (exit_code=99)
- 15:36 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 15:36 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2010-dev.codfw.wmnet
- 15:35 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet
- 15:35 vgutierrez@cumin1003: END (FAIL) - Cookbook sre.loadbalancer.check-ipip (exit_code=99)
- 15:35 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.check-ipip
- 15:28 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2005-dev.codfw.wmnet
- 15:26 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:25 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:23 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:22 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 15:22 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 15:22 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 15:19 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:19 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:19 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:16 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2006-dev.codfw.wmnet
- 15:12 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudidp2001-dev.codfw.wmnet
- 15:08 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudidp2001-dev.codfw.wmnet
- 15:07 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2006-dev.codfw.wmnet
- 14:58 fnegri@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for clouddb1013.eqiad.wmnet
- 14:58 fnegri@cumin1003: START - Cookbook sre.hosts.remove-downtime for clouddb1013.eqiad.wmnet
- 14:57 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s3
- 14:57 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s1
- 14:48 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host backup1015.eqiad.wmnet
- 14:46 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1035.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:46 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2005-dev.codfw.wmnet
- 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
- 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
- 14:44 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1373.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:43 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1015.eqiad.wmnet
- 14:43 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1034.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:42 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1033.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:40 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1023
- 14:40 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1023
- 14:40 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1022
- 14:40 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1022
- 14:40 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1021
- 14:39 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2004.codfw.wmnet
- 14:39 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1021
- 14:38 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2005-dev.codfw.wmnet
- 14:37 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1020
- 14:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
- 14:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
- 14:35 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1020
- 14:35 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1013.eqiad.wmnet with reason: Rebooting clouddb1013 T419960
- 14:33 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:33 jclark@cumin1003: START - Cookbook sre.hosts.provision for host wdqs1035.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:33 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet
- 14:32 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wdqs1035.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:32 jclark@cumin1003: START - Cookbook sre.hosts.provision for host wdqs1035.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
- 14:32 jclark@cumin1003: START - Cookbook sre.hosts.provision for host wdqs1034.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
- 14:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host wdqs1033.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-worker1373.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:29 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:29 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt - jclark@cumin1003"
- 14:29 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt - jclark@cumin1003"
- 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
- 14:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
- 14:27 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudbackup2004.codfw.wmnet
- 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
- 14:25 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup2003.codfw.wmnet
- 14:25 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 14:25 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 14:24 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 14:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
- 14:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
- 14:22 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1004.eqiad.wmnet
- 14:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
- 14:14 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudbackup2003.codfw.wmnet
- 14:13 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudbackup1004.eqiad.wmnet
- 14:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1003.eqiad.wmnet
- 14:01 andrew@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudbackup1003.eqiad.wmnet
- 13:59 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit1003.wikimedia.org
- 13:53 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host gerrit1003.wikimedia.org
- 13:49 aokoth@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lists2001.wikimedia.org
- 13:48 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1004.eqiad.wmnet
- 13:46 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1004.eqiad.wmnet
- 13:45 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 13:45 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 13:44 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host vrts1004.eqiad.wmnet
- 13:42 aokoth@cumin1003: START - Cookbook sre.hosts.reboot-single for host lists2001.wikimedia.org
- 13:42 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host etherpad1004.eqiad.wmnet
- 13:37 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad2002.codfw.wmnet
- 13:36 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gerrit2002.wikimedia.org
- 13:33 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host etherpad2002.codfw.wmnet
- 13:32 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
- 13:30 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host gerrit2002.wikimedia.org
- 13:26 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
- 13:26 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
- 13:24 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup2020.codfw.wmnet
- 13:23 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup2019.codfw.wmnet
- 13:19 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
- 13:19 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
- 13:13 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2020.codfw.wmnet
- 13:13 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
- 13:12 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2019.codfw.wmnet
- 13:11 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
- 13:05 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup2018.codfw.wmnet
- 13:05 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup1020.eqiad.wmnet
- 12:54 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2018.codfw.wmnet
- 12:54 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1020.eqiad.wmnet
- 12:54 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup2017.codfw.wmnet
- 12:54 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup1019.eqiad.wmnet
- 12:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:50 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:50 moritzm: powercycle pki1002
- 12:48 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:47 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:44 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:44 mutante: rebooted phab1005 - waiting for it to come back
- 12:44 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:43 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2017.codfw.wmnet
- 12:43 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1019.eqiad.wmnet
- 12:42 bwojtowicz@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 12:40 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup1018.eqiad.wmnet
- 12:39 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup2016.codfw.wmnet
- 12:31 jelto@cumin1003: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
- 12:29 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1018.eqiad.wmnet
- 12:29 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup1017.eqiad.wmnet
- 12:28 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2016.codfw.wmnet
- 12:27 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup2015.codfw.wmnet
- 12:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1004.wikimedia.org
- 12:18 aokoth@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host doc1004.eqiad.wmnet
- 12:18 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1017.eqiad.wmnet
- 12:17 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:17 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:15 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup2015.codfw.wmnet
- 12:15 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:15 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:14 aokoth@cumin1003: START - Cookbook sre.hosts.reboot-single for host doc1004.eqiad.wmnet
- 12:13 aokoth@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aphlict2001.codfw.wmnet
- 12:10 aokoth@cumin1003: START - Cookbook sre.hosts.reboot-single for host aphlict2001.codfw.wmnet
- 12:10 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: reboot
- 12:10 aokoth@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
- 12:07 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:07 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:03 aokoth@cumin1003: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
- 12:02 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:02 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:01 jynus@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host backup1016.eqiad.wmnet
- 12:01 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1019.eqiad.wmnet
- 11:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1018.eqiad.wmnet
- 11:59 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 11:59 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 11:54 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1019.eqiad.wmnet
- 11:54 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1018.eqiad.wmnet
- 11:51 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 11:51 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 11:50 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host backup1016.eqiad.wmnet
- 11:49 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-backup2004.codfw.wmnet
- 11:43 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-backup2004.codfw.wmnet
- 11:43 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-backup1004.eqiad.wmnet
- 11:37 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-backup1004.eqiad.wmnet
- 11:36 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-backup2003.codfw.wmnet
- 11:34 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-backup1003.eqiad.wmnet
- 11:32 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1018.eqiad.wmnet with OS bookworm
- 11:32 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 11:30 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-backup2003.codfw.wmnet
- 11:28 jynus@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-backup1003.eqiad.wmnet
- 11:27 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 11:26 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy1001.eqiad.wmnet
- 11:21 arnaudb@cumin1003: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host contint1003.wikimedia.org
- 11:21 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy1001.eqiad.wmnet
- 11:21 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy1002.eqiad.wmnet
- 11:16 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy1002.eqiad.wmnet
- 11:16 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy2001.codfw.wmnet
- 11:16 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host contint1003.wikimedia.org
- 11:12 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-master-codfw
- 11:12 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zuul1001.eqiad.wmnet
- 11:11 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy2001.codfw.wmnet
- 11:11 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy2002.codfw.wmnet
- 11:09 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1018.eqiad.wmnet with reason: host reimage
- 11:09 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1019.eqiad.wmnet with OS bookworm
- 11:09 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 11:09 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-master-eqiad
- 11:08 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 11:08 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host zuul1001.eqiad.wmnet
- 11:08 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol1008-dev.eqiad.wmnet
- 11:07 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy2002.codfw.wmnet
- 11:06 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy3001.esams.wmnet
- 11:05 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1018.eqiad.wmnet with reason: host reimage
- 11:01 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy3001.esams.wmnet
- 11:01 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1008-dev.eqiad.wmnet
- 11:01 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1002-dev.eqiad.wmnet
- 11:01 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy3002.esams.wmnet
- 10:59 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 22:00:00 on db1258.eqiad.wmnet with reason: depooled, likely to flap over the weekend
- 10:57 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudbackup1002-dev.eqiad.wmnet
- 10:57 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet
- 10:56 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy3002.esams.wmnet
- 10:56 jayme@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-master-codfw
- 10:55 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy4001.ulsfo.wmnet
- 10:55 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
- 10:54 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudbackup1001-dev.eqiad.wmnet
- 10:52 jayme@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-master-eqiad
- 10:50 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
- 10:50 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1018.eqiad.wmnet with OS bookworm
- 10:50 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy4001.ulsfo.wmnet
- 10:50 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy4002.ulsfo.wmnet
- 10:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1019.eqiad.wmnet with reason: host reimage
- 10:46 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1019.eqiad.wmnet with reason: host reimage
- 10:45 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy4002.ulsfo.wmnet
- 10:45 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy5001.eqsin.wmnet
- 10:40 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy5001.eqsin.wmnet
- 10:39 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy5002.eqsin.wmnet
- 10:37 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zuul2001.codfw.wmnet
- 10:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depool', diff saved to https://phabricator.wikimedia.org/P89852 and previous config saved to /var/cache/conftool/dbconfig/20260313-103719-ladsgroup.json
- 10:33 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host zuul2001.codfw.wmnet
- 10:32 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy5002.eqsin.wmnet
- 10:31 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
- 10:31 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zuul1002.eqiad.wmnet
- 10:31 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy6001.drmrs.wmnet
- 10:29 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:29 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:28 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:28 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:28 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1019.eqiad.wmnet with OS bookworm
- 10:27 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1019.eqiad.wmnet with OS bookworm
- 10:27 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:27 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host zuul1002.eqiad.wmnet
- 10:27 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:26 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy6001.drmrs.wmnet
- 10:24 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
- 10:23 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy6002.drmrs.wmnet
- 10:22 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host zuul2002.codfw.wmnet
- 10:19 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tools-k8s-worker1008.eqiad.wmnet
- 10:18 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host zuul2002.codfw.wmnet
- 10:18 arnaudb@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host zuul2002.codfw.wmnet
- 10:18 arnaudb@cumin1003: START - Cookbook sre.hosts.reboot-single for host zuul2002.codfw.wmnet
- 10:18 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy6002.drmrs.wmnet
- 10:16 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy7002.magru.wmnet
- 10:16 jayme@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
- 10:15 jayme@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
- 10:13 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host tools-k8s-worker1008.eqiad.wmnet
- 10:13 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tools-k8s-worker1007.eqiad.wmnet
- 10:12 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy7002.magru.wmnet
- 10:09 jelto@cumin1003: conftool action : set/pooled=yes; selector: name=tcp-proxy7001.magru.wmnet
- 10:08 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host tools-k8s-worker1007.eqiad.wmnet
- 10:08 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tools-k8s-worker1006.eqiad.wmnet
- 10:07 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy7001.magru.wmnet
- 10:03 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host tcp-proxy7001.magru.wmnet
- 10:02 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host tools-k8s-worker1006.eqiad.wmnet
- 10:02 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tools-k8s-worker1005.eqiad.wmnet
- 10:01 jelto@cumin1003: conftool action : set/pooled=no; selector: name=tcp-proxy7001.magru.wmnet
- 09:58 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1019.eqiad.wmnet with OS bookworm
- 09:57 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host tools-k8s-worker1005.eqiad.wmnet
- 09:57 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tools-k8s-worker1004.eqiad.wmnet
- 09:51 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host tools-k8s-worker1004.eqiad.wmnet
- 09:51 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tools-k8s-worker1003.eqiad.wmnet
- 09:50 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 09:50 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 09:46 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host tools-k8s-worker1003.eqiad.wmnet
- 09:46 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tools-k8s-worker1002.eqiad.wmnet
- 09:41 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host tools-k8s-worker1002.eqiad.wmnet
- 09:40 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tools-k8s-worker1001.eqiad.wmnet
- 09:39 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:39 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:35 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host tools-k8s-worker1001.eqiad.wmnet
- 09:35 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tools-k8s-ctrl1002.eqiad.wmnet
- 09:34 filippo@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tools-k8s-ctrl1001.eqiad.wmnet
- 09:34 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:33 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:32 moritzm: installing Linux 6.1.164 on Bookworm hosts
- 09:30 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host tools-k8s-ctrl1002.eqiad.wmnet
- 09:28 filippo@cumin1003: START - Cookbook sre.hosts.reboot-single for host tools-k8s-ctrl1001.eqiad.wmnet
- 09:01 elukey@cumin1003: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-test-eqiad cluster: Change Confluent distribution.
- 08:37 elukey@cumin1003: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-test-eqiad cluster: Change Confluent distribution.
- 07:56 moritzm: installing Linux 6.12.74 on Trixie hosts
- 07:55 moritzm: installing 6.12.74 on Trixie hosts
- 02:57 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet [reason: trixie reimaging]
- 02:09 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 18s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:41 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS trixie
- 01:37 mutante: contint1003/contint2003 - every time(?) we setup machines with puppet using our httpd module and PHP - and puppet runs for the first time we run into the same old issue with "Exec[ensure_present_mod_php" failing and "Considering conflict mpm_worker for mpm_prefork"sudo a2dismod mpm_event". The fix is: 'sudo a2dismod mpm_event' and run puppet again. T418521
- 01:26 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on contint1003.wikimedia.org with reason: T418521
- 01:26 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on contint2003.wikimedia.org with reason: T418521
- 01:23 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on contint2003.wikimedia.org with reason: setup
- 01:22 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on contint1003.wikimedia.org with reason: setup
- 01:22 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4047.*
- 01:09 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
- 01:08 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet [reason: trixie reimaging]
- 01:06 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
- 01:05 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS trixie
- 00:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4047.ulsfo.wmnet with OS trixie
- 00:45 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS trixie
- 00:45 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet [reason: trixie reimaging]
- 00:42 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet [reason: trixie reimaging]
- 00:41 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS trixie
- 00:39 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
- 00:31 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
- 00:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
- 00:27 rzl@deploy2002: Finished scap sync-world: https://gerrit.wikimedia.org/r/1251187 T419637 (duration: 07m 12s)
- 00:23 rzl@deploy2002: rzl: Continuing with sync
- 00:23 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
- 00:22 rzl@deploy2002: rzl: https://gerrit.wikimedia.org/r/1251187 T419637 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:21 rzl@deploy2002: Started scap sync-world: https://gerrit.wikimedia.org/r/1251187 T419637
- 00:15 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
- 00:14 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet [reason: trixie reimaging]
- 00:11 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
- 00:11 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS trixie
- 00:10 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS trixie
- 00:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS trixie
- 00:03 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS trixie
- 00:03 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4043.ulsfo.wmnet with OS trixie
2026-03-12
- 23:57 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host o11ytest1001.eqiad.wmnet with OS trixie
- 23:53 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 23:53 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 23:50 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie
- 23:49 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 23:49 rzl@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 23:45 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 23:45 rzl@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 23:45 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
- 23:44 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4042.ulsfo.wmnet with OS trixie
- 23:41 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
- 23:41 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 23:41 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 23:40 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on o11ytest1001.eqiad.wmnet with reason: host reimage
- 23:36 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 23:36 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on o11ytest1001.eqiad.wmnet with reason: host reimage
- 23:36 rzl@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 23:35 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 23:35 rzl@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 23:22 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host o11ytest1001
- 23:22 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host o11ytest1001
- 23:21 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie
- 23:19 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4040.ulsfo.wmnet with OS trixie
- 23:18 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host o11ytest1001
- 23:18 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) o11ytest1001.eqiad.wmnet 141.32.64.10.in-addr.arpa 1.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 23:18 herron@cumin1003: START - Cookbook sre.dns.wipe-cache o11ytest1001.eqiad.wmnet 141.32.64.10.in-addr.arpa 1.4.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 23:18 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:18 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host o11ytest1001 - herron@cumin1003"
- 23:18 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host o11ytest1001 - herron@cumin1003"
- 23:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS trixie
- 23:00 herron@cumin1003: START - Cookbook sre.dns.netbox
- 23:00 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host o11ytest1001
- 22:59 herron@cumin1003: START - Cookbook sre.hosts.reimage for host o11ytest1001.eqiad.wmnet with OS trixie
- 22:58 herron@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mwlog1002 to o11ytest1001
- 22:57 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host o11ytest1001
- 22:55 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host o11ytest1001
- 22:55 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) o11ytest1001 on all recursors
- 22:55 herron@cumin1003: START - Cookbook sre.dns.wipe-cache o11ytest1001 on all recursors
- 22:55 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:55 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mwlog1002 to o11ytest1001 - herron@cumin1003"
- 22:54 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mwlog1002 to o11ytest1001 - herron@cumin1003"
- 22:51 herron@cumin1003: START - Cookbook sre.dns.netbox
- 22:50 herron@cumin1003: START - Cookbook sre.hosts.rename from mwlog1002 to o11ytest1001
- 22:42 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS trixie
- 22:42 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet [reason: trixie reimaging]
- 22:41 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet [reason: trixie reimaging]
- 22:39 bvibber@deploy2002: Finished scap sync-world: Backport for Enable ReaderExperiments Share Highlight subfeature for metrics (T416945), Metrics module for share highlight experiment baseline (T416945) (duration: 06m 49s)
- 22:38 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS trixie
- 22:35 bvibber@deploy2002: bvibber: Continuing with sync
- 22:34 bvibber@deploy2002: bvibber: Backport for Enable ReaderExperiments Share Highlight subfeature for metrics (T416945), Metrics module for share highlight experiment baseline (T416945) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:32 bvibber@deploy2002: Started scap sync-world: Backport for Enable ReaderExperiments Share Highlight subfeature for metrics (T416945), Metrics module for share highlight experiment baseline (T416945)
- 22:28 rzl@deploy2002: Finished scap sync-world: https://gerrit.wikimedia.org/r/1251182 T419637 (duration: 11m 18s)
- 22:27 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host o11ytest2001.codfw.wmnet with OS trixie
- 22:26 rzl@deploy2002: rzl: Continuing with sync
- 22:24 rzl@deploy2002: rzl: https://gerrit.wikimedia.org/r/1251182 T419637 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:23 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS trixie
- 22:23 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp4042.ulsfo.wmnet [reason: trixie reimaging]
- 22:20 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4046.*
- 22:17 rzl@deploy2002: Started scap sync-world: https://gerrit.wikimedia.org/r/1251182 T419637
- 22:13 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
- 22:09 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on o11ytest2001.codfw.wmnet with reason: host reimage
- 22:08 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
- 22:03 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on o11ytest2001.codfw.wmnet with reason: host reimage
- 22:01 jasmine@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2006.codfw.wmnet with OS trixie
- 21:58 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS trixie
- 21:58 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp4040.ulsfo.wmnet [reason: trixie reimaging]
- 21:45 herron@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host o11ytest2001
- 21:45 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host o11ytest2001
- 21:45 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host o11ytest2001
- 21:45 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) o11ytest2001.codfw.wmnet 9.32.192.10.in-addr.arpa 9.0.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 21:45 herron@cumin1003: START - Cookbook sre.dns.wipe-cache o11ytest2001.codfw.wmnet 9.32.192.10.in-addr.arpa 9.0.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 21:45 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:45 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host o11ytest2001 - herron@cumin1003"
- 21:45 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host o11ytest2001 - herron@cumin1003"
- 21:43 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS trixie
- 21:41 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet [reason: trixie reimaging]
- 21:40 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS trixie
- 21:39 herron@cumin1003: START - Cookbook sre.dns.netbox
- 21:39 herron@cumin1003: START - Cookbook sre.hosts.move-vlan for host o11ytest2001
- 21:39 jasmine@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2006.codfw.wmnet with OS trixie
- 21:39 herron@cumin1003: START - Cookbook sre.hosts.reimage for host o11ytest2001.codfw.wmnet with OS trixie
- 21:36 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 21:35 herron@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mwlog2002 to o11ytest2001
- 21:35 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 21:35 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 21:35 herron@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host o11ytest2001
- 21:34 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 21:34 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 21:33 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 21:32 herron@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host o11ytest2001
- 21:32 herron@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) o11ytest2001 on all recursors
- 21:32 herron@cumin1003: START - Cookbook sre.dns.wipe-cache o11ytest2001 on all recursors
- 21:32 herron@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:32 herron@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mwlog2002 to o11ytest2001 - herron@cumin1003"
- 21:31 herron@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mwlog2002 to o11ytest2001 - herron@cumin1003"
- 21:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS trixie
- 21:27 herron@cumin1003: START - Cookbook sre.dns.netbox
- 21:26 herron@cumin1003: START - Cookbook sre.hosts.rename from mwlog2002 to o11ytest2001
- 21:20 rzl: rzl@apt1002:~$ sudo -i reprepro copy trixie-wikimedia bullseye-wikimedia envoyproxy
- 21:20 rzl: rzl@apt1002:~$ sudo -i reprepro copy bookworm-wikimedia bullseye-wikimedia envoyproxy
- 21:20 rzl: rzl@apt1002:~$ sudo -i reprepro -C main includedeb bullseye-wikimedia /srv/wikimedia/pool/component/envoy-future/e/envoyproxy/envoyproxy_1.35.9-1_amd64.deb
- 21:13 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
- 21:13 cscott@deploy2002: Finished scap sync-world: Backport for Revert "Move post-processing of flaggedrevs views inside FlaggablePageView" (duration: 07m 28s)
- 21:09 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
- 21:09 cscott@deploy2002: cscott: Continuing with sync
- 21:07 cscott@deploy2002: cscott: Backport for Revert "Move post-processing of flaggedrevs views inside FlaggablePageView" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:05 cscott@deploy2002: Started scap sync-world: Backport for Revert "Move post-processing of flaggedrevs views inside FlaggablePageView"
- 21:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
- 21:02 tgr@deploy2002: Finished scap sync-world: Backport for Use 'alwaysShowLogin' query parameter during login (T419723), login: Add 'alwaysShowLogin' login URL parameter (T419723), PersonalDashboard: enable CTA for pilot wikis (T418613), Enable parser survey for opted-out users on ru/pt/ja/id wikis (T414852) (duration: 10m 41s)
- 20:58 tgr@deploy2002: tgr, jsn, cscott: Continuing with sync
- 20:58 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
- 20:54 tgr@deploy2002: tgr, jsn, cscott: Backport for Use 'alwaysShowLogin' query parameter during login (T419723), login: Add 'alwaysShowLogin' login URL parameter (T419723), PersonalDashboard: enable CTA for pilot wikis (T418613), Enable parser survey for opted-out users on ru/pt/ja/id wikis (T414852) synced to the testservers (see https://wikitech
- 20:52 tgr@deploy2002: Started scap sync-world: Backport for Use 'alwaysShowLogin' query parameter during login (T419723), login: Add 'alwaysShowLogin' login URL parameter (T419723), PersonalDashboard: enable CTA for pilot wikis (T418613), Enable parser survey for opted-out users on ru/pt/ja/id wikis (T414852)
- 20:49 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie
- 20:43 tgr@deploy2002: Finished scap sync-world: Backport for Set 'sub' JWT field in client credentials access tokens (T417278), Set 'sub' JWT field in client credentials access tokens (T417278), phpunit: Avoid unnecessary writes in generatePHPUnitConfig.php (T419107) (duration: 07m 37s)
- 20:39 tgr@deploy2002: tgr, daimona: Continuing with sync
- 20:37 tgr@deploy2002: tgr, daimona: Backport for Set 'sub' JWT field in client credentials access tokens (T417278), Set 'sub' JWT field in client credentials access tokens (T417278), phpunit: Avoid unnecessary writes in generatePHPUnitConfig.php (T419107) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:37 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS trixie
- 20:35 tgr@deploy2002: Started scap sync-world: Backport for Set 'sub' JWT field in client credentials access tokens (T417278), Set 'sub' JWT field in client credentials access tokens (T417278), phpunit: Avoid unnecessary writes in generatePHPUnitConfig.php (T419107)
- 20:35 jsn@deploy2002: Synchronized wmf-config/throttle.php: (no justification provided) (duration: 01m 57s)
- 20:32 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4045.*
- 20:28 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4041.ulsfo.wmnet with OS trixie
- 20:20 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1019.eqiad.wmnet with OS bookworm
- 20:18 jsn@deploy2002: Finished scap sync-world: Backport for PersonalDashboard: enable CTA for pilot wikis (T418613), [arwikiquote] add namespace alias for NS_PROJECT (T419828), Deploy participant recruitment survey on frwiki (T419778), Increase IP cap limit for azwiki (T419899) (duration: 11m 11s)
- 20:14 jsn@deploy2002: jsn, dani, nmw03, gergesshamon: Continuing with sync
- 20:09 jsn@deploy2002: jsn, dani, nmw03, gergesshamon: Backport for PersonalDashboard: enable CTA for pilot wikis (T418613), [arwikiquote] add namespace alias for NS_PROJECT (T419828), Deploy participant recruitment survey on frwiki (T419778), Increase IP cap limit for azwiki (T419899) synced to the testservers (see https://wikitech.wikimedia.org/wik
- 20:07 jsn@deploy2002: Started scap sync-world: Backport for PersonalDashboard: enable CTA for pilot wikis (T418613), [arwikiquote] add namespace alias for NS_PROJECT (T419828), Deploy participant recruitment survey on frwiki (T419778), Increase IP cap limit for azwiki (T419899)
- 19:21 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
- 19:21 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/mathoid: apply
- 19:20 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
- 19:19 rzl@deploy2002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
- 19:16 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 19:16 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 19:15 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 19:14 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 19:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 19:12 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 19:12 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 19:11 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 19:07 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS trixie
- 19:06 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet [reason: trixie reimaging]
- 19:06 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp4039.ulsfo.wmnet [reason: trixie reimaging]
- 19:06 swfrench-wmf: reprepro include xdebug_3.4.4-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:05 swfrench-wmf: reprepro include wikidiff2_1.14.1-2+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:05 swfrench-wmf: reprepro include php-yaml_2.2.4-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:05 swfrench-wmf: reprepro include php-xhprof_2.3.10-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:05 swfrench-wmf: reprepro include php-wmerrors_2.0.0-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:05 swfrench-wmf: reprepro include php-uuid_1.3.0-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:05 brennen@deploy2002: Finished scap sync-world: Backport for EditPage: Re-add catch block for MWException (T419883) (duration: 09m 46s)
- 19:04 swfrench-wmf: reprepro include php-redis_6.2.0-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:04 swfrench-wmf: reprepro include php-pcov_1.0.12-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:04 swfrench-wmf: reprepro include php-memcached_3.3.0-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:04 swfrench-wmf: reprepro include php-luasandbox_4.1.2-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:03 swfrench-wmf: reprepro include php-imagick_3.7.0-13+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:03 swfrench-wmf: reprepro include php-excimer_1.2.5-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 19:01 brennen@deploy2002: somerandomdeveloper, brennen: Continuing with sync
- 18:59 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1019.eqiad.wmnet with OS bookworm
- 18:57 brennen@deploy2002: somerandomdeveloper, brennen: Backport for EditPage: Re-add catch block for MWException (T419883) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:55 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4039.ulsfo.wmnet with OS trixie
- 18:55 brennen@deploy2002: Started scap sync-world: Backport for EditPage: Re-add catch block for MWException (T419883)
- 18:52 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
- 18:52 rzl@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
- 18:42 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp20(2[789]|3[0-9]|40).*,service=ats-be
- 18:34 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.19 refs T413810
- 18:29 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
- 18:29 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:28 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating dse-k8s-worker1019 - btullis@cumin1003"
- 18:26 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2332.codfw.wmnet
- 18:26 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2332.codfw.wmnet
- 18:25 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating dse-k8s-worker1019 - btullis@cumin1003"
- 18:24 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
- 18:23 brennen@deploy2002: Finished scap sync-world: Backport for Ensure that we always run ParserHooks::transformHtml() when using Parsoid (T419830) (duration: 14m 46s)
- 18:21 btullis@cumin1003: START - Cookbook sre.dns.netbox
- 18:20 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4038.ulsfo.wmnet with OS trixie
- 18:19 brennen@deploy2002: cscott, brennen: Continuing with sync
- 18:18 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1018.eqiad.wmnet with OS bookworm
- 18:16 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS trixie
- 18:10 brennen@deploy2002: cscott, brennen: Backport for Ensure that we always run ParserHooks::transformHtml() when using Parsoid (T419830) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:10 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1019.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:08 brennen@deploy2002: Started scap sync-world: Backport for Ensure that we always run ParserHooks::transformHtml() when using Parsoid (T419830)
- 18:02 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1019.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:02 swfrench-wmf: reprepro include php-igbinary_3.2.16-4+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 17:59 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS trixie
- 17:58 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1019
- 17:58 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1019
- 17:56 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp4039.ulsfo.wmnet [reason: trixie reimaging]
- 17:55 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp20(3[6-9]|4[012]).*
- 17:54 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1018.eqiad.wmnet with OS bookworm
- 17:53 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet [reason: trixie reimaging]
- 17:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
- 17:49 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS trixie
- 17:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
- 17:40 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:33 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:31 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1018
- 17:31 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1018
- 17:30 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1018.eqiad.wmnet with OS bookworm
- 17:28 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS trixie
- 17:28 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS trixie
- 17:27 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp203[0-5].*
- 17:24 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
- 17:20 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1018.eqiad.wmnet with OS bookworm
- 17:20 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1018.eqiad.wmnet with OS bookworm
- 17:18 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
- 17:17 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-backup1004.eqiad.wmnet with OS trixie
- 17:17 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 17:16 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 17:06 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp202[89].*
- 17:03 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2027.*
- 16:59 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS trixie
- 16:58 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp4038.ulsfo.wmnet [reason: trixie reimaging]
- 16:58 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1004.eqiad.wmnet with reason: host reimage
- 16:58 swfrench-wmf: reprepro include php-msgpack_3.0.0-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 16:57 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS trixie
- 16:57 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet [reason: trixie reimaging]
- 16:55 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1004.eqiad.wmnet with reason: host reimage
- 16:50 swfrench-wmf: reprepro include php-apcu_5.1.24-1+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 16:45 akhatun@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 16:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:43 akhatun@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 16:43 swfrench-wmf: reprepro include dh-php_5.5+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 16:42 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1018.eqiad.wmnet with OS bookworm
- 16:41 swfrench-wmf: reprepro include php-defaults_94+wmf11u1+icu72u1 into component/php83-icu72 - T419058
- 16:37 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-backup1004.eqiad.wmnet with OS trixie
- 16:36 swfrench-wmf: reprepro include php8.3_8.3.30-1+wmf11u2+icu72u1 into component/php83-icu72 - T419058
- 16:27 dzahn@dns1004: END - running authdns-update
- 16:26 dzahn@dns1004: START - running authdns-update
- 16:25 mutante: switching old status.wikimedia.org page away from rackspace T414098
- 16:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS trixie
- 16:20 dzahn@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 16:20 dzahn@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 16:19 dzahn@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 16:19 dzahn@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 16:12 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 16:11 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 16:10 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
- 16:09 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
- 16:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply
- 16:09 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:08 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply
- 16:07 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 16:06 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 16:05 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 16:04 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 16:04 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 16:04 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 16:03 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
- 16:02 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
- 16:02 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-fr-tech: apply
- 16:02 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 16:01 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-fr-tech: apply
- 15:58 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:57 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:57 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 15:56 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 15:52 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudgw2002-dev.codfw.wmnet
- 15:52 andrew@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:52 andrew@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002"
- 15:47 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002"
- 15:43 andrew@cumin2002: START - Cookbook sre.dns.netbox
- 15:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:36 andrew@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudgw2002-dev.codfw.wmnet
- 15:35 joal@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 15:33 joal@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 15:27 ebernhardson@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 15:26 ebernhardson@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 15:19 moritzm: reuploadd libxml2 2.9.10+dfsg-6.7+deb11u9+wmf11u1 and 72.1-3+deb12u1~wmf11u1 to component/php83-icu72 for bullseye-wikimedia T419058
- 15:14 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:13 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:13 jmm@puppetserver1001: conftool action : set/pooled=yes; selector: name=tcp-proxy4004.ulsfo.wmnet
- 15:13 jmm@puppetserver1001: conftool action : set/weight=1; selector: name=tcp-proxy4004.ulsfo.wmnet
- 15:12 jmm@puppetserver1001: conftool action : set/pooled=yes; selector: name=tcp-proxy4003.ulsfo.wmnet
- 15:12 jmm@puppetserver1001: conftool action : set/weight=1; selector: name=tcp-proxy4003.ulsfo.wmnet
- 15:00 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 15:00 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 14:57 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 14:56 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 14:45 akhatun@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 14:44 akhatun@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 14:41 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:34 andrew@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 14:33 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:31 andrew@cumin2002: START - Cookbook sre.dns.netbox
- 14:31 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1018
- 14:31 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1018
- 14:25 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 14:24 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 14:20 ayounsi@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet
- 14:15 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 24 hosts with reason: Switch BGP bounce
- 14:12 ayounsi@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet
- 14:09 mlitn@deploy2002: Finished scap sync-world: Backport for Update CSS selector for Mobile TOC button (T419587), Update CSS selector for Mobile TOC button (T419587), Remove queueing logic (T419587), Remove queueing logic (T419587) (duration: 07m 15s)
- 14:08 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2008-dev.codfw.wmnet with OS trixie
- 14:07 akhatun@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 14:07 akhatun@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-edit-type-enrich-next: apply
- 14:05 mlitn@deploy2002: mlitn: Continuing with sync
- 14:04 mlitn@deploy2002: mlitn: Backport for Update CSS selector for Mobile TOC button (T419587), Update CSS selector for Mobile TOC button (T419587), Remove queueing logic (T419587), Remove queueing logic (T419587) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:03 XioNoX: start eqiad rack D2 depools
- 14:02 mlitn@deploy2002: Started scap sync-world: Backport for Update CSS selector for Mobile TOC button (T419587), Update CSS selector for Mobile TOC button (T419587), Remove queueing logic (T419587), Remove queueing logic (T419587)
- 13:59 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 13:59 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 13:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:54 moritzm: installing libssh security updates
- 13:54 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:45 phuedx@deploy2002: Finished scap sync-world: Backport for ext.testKitchen: Depend on mediawiki.user module, Add title to the request context in FlaggedRevsCacheTest (T419539), ext.testKitchen: Depend on mediawiki.user module (duration: 08m 01s)
- 13:42 phuedx@deploy2002: phuedx: Continuing with sync
- 13:39 phuedx@deploy2002: phuedx: Backport for ext.testKitchen: Depend on mediawiki.user module, Add title to the request context in FlaggedRevsCacheTest (T419539), ext.testKitchen: Depend on mediawiki.user module synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:37 phuedx@deploy2002: Started scap sync-world: Backport for ext.testKitchen: Depend on mediawiki.user module, Add title to the request context in FlaggedRevsCacheTest (T419539), ext.testKitchen: Depend on mediawiki.user module
- 13:26 esanders@deploy2002: Finished scap sync-world: Backport for Deploy EditCheck suggestion mode at all Wikipedias (T415320) (duration: 06m 42s)
- 13:24 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:22 esanders@deploy2002: esanders: Continuing with sync
- 13:22 esanders@deploy2002: esanders: Backport for Deploy EditCheck suggestion mode at all Wikipedias (T415320) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:22 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1178.eqiad.wmnet
- 13:21 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 13:20 esanders@deploy2002: Started scap sync-world: Backport for Deploy EditCheck suggestion mode at all Wikipedias (T415320)
- 13:18 kgraessle@deploy2002: Finished scap sync-world: Backport for Add multilingual revert risk host header for LiftWing requests (T419718) (duration: 10m 52s)
- 13:14 fnegri@cumin1003: END (FAIL) - Cookbook sre.wikireplicas.add-wiki (exit_code=99) for database kaiwiki (T414240)
- 13:14 fnegri@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database kaiwiki (T414240)
- 13:14 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1178.eqiad.wmnet
- 13:14 kgraessle@deploy2002: kgraessle: Continuing with sync
- 13:12 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 13:11 kgraessle@deploy2002: kgraessle: Backport for Add multilingual revert risk host header for LiftWing requests (T419718) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:10 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 13:07 kgraessle@deploy2002: Started scap sync-world: Backport for Add multilingual revert risk host header for LiftWing requests (T419718)
- 13:05 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1159.eqiad.wmnet
- 13:03 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 13:02 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 13:02 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 13:02 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:57 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1159.eqiad.wmnet
- 12:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1013.eqiad.wmnet
- 12:49 dpogorzelski@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
- 12:49 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1013.eqiad.wmnet
- 12:49 dpogorzelski@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
- 12:33 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 12:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 12:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum4004.ulsfo.wmnet
- 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy4004.ulsfo.wmnet
- 12:28 moritzm: installing postgresql-17 security updates
- 12:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy4004.ulsfo.wmnet
- 12:14 moritzm: installing wireshark security updates
- 12:14 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
- 12:07 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
- 11:52 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 11:51 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 11:51 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 11:50 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy4004.ulsfo.wmnet
- 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy4004.ulsfo.wmnet with OS trixie
- 11:49 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy4004.ulsfo.wmnet with reason: host reimage
- 11:28 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy4004.ulsfo.wmnet with reason: host reimage
- 11:19 jayme: disabled puppet on all wikikube worker nodes to rollout/test new apparmor profiles in staging - T419781
- 11:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy4004.ulsfo.wmnet with OS trixie
- 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy4004.ulsfo.wmnet - jmm@cumin2002"
- 11:06 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy4004.ulsfo.wmnet - jmm@cumin2002"
- 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy4004.ulsfo.wmnet on all recursors
- 11:06 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy4004.ulsfo.wmnet on all recursors
- 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy4004.ulsfo.wmnet - jmm@cumin2002"
- 11:03 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:00 vriley@cumin1003: START - Cookbook sre.dns.netbox
- 10:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy4004.ulsfo.wmnet - jmm@cumin2002"
- 10:42 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-23-ulsfo
- 10:41 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
- 10:35 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 10:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy4003.ulsfo.wmnet
- 10:31 vgutierrez@puppetserver1001: conftool action : set/pooled=no; selector: name=ncredir4001.ulsfo.wmnet
- 10:31 vgutierrez@puppetserver1001: conftool action : set/pooled=no; selector: name=ncredir4002.ulsfo.wmnet
- 10:31 vgutierrez@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir4004.ulsfo.wmnet
- 10:30 vgutierrez@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir4003.ulsfo.wmnet
- 10:30 vgutierrez: repooling ncredir4003 & ncredir4004
- 10:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy4003.ulsfo.wmnet
- 10:27 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy4004.ulsfo.wmnet
- 10:26 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 10:26 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 10:25 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 10:23 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1013
- 10:22 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1013
- 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host tcp-proxy4003.ulsfo.wmnet
- 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tcp-proxy4003.ulsfo.wmnet with OS trixie
- 10:12 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 10:12 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1011.eqiad.wmnet
- 10:12 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 10:11 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:11 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:10 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:09 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:06 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1011.eqiad.wmnet
- 10:06 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1010.eqiad.wmnet
- 10:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tcp-proxy4003.ulsfo.wmnet with reason: host reimage
- 10:00 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1010.eqiad.wmnet
- 09:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on tcp-proxy4003.ulsfo.wmnet with reason: host reimage
- 09:48 trueg@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/SERVICE_NAME: apply
- 09:48 trueg@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/SERVICE_NAME: apply
- 09:40 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=ms-fe2024.codfw.wmnet
- 09:40 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=ms-fe2023.codfw.wmnet
- 09:40 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=ms-fe2022.codfw.wmnet
- 09:40 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=ms-fe2021.codfw.wmnet
- 09:39 mvernon@cumin2002: conftool action : set/weight=40; selector: name=ms-fe2024.codfw.wmnet
- 09:39 mvernon@cumin2002: conftool action : set/weight=40; selector: name=ms-fe2023.codfw.wmnet
- 09:39 mvernon@cumin2002: conftool action : set/weight=40; selector: name=ms-fe2022.codfw.wmnet
- 09:39 mvernon@cumin2002: conftool action : set/weight=40; selector: name=ms-fe2021.codfw.wmnet
- 09:39 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Post reimage - btullis@cumin1003"
- 09:39 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Post reimage - btullis@cumin1003"
- 09:39 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 09:39 btullis@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 09:38 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 09:38 btullis@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 09:35 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe[2009-2020].codfw.wmnet} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
- 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host durum4004.ulsfo.wmnet
- 09:32 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - T419712
- 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host tcp-proxy4003.ulsfo.wmnet with OS trixie
- 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy4003.ulsfo.wmnet - jmm@cumin2002"
- 09:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM tcp-proxy4003.ulsfo.wmnet - jmm@cumin2002"
- 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) tcp-proxy4003.ulsfo.wmnet on all recursors
- 09:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache tcp-proxy4003.ulsfo.wmnet on all recursors
- 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy4003.ulsfo.wmnet - jmm@cumin2002"
- 09:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM tcp-proxy4003.ulsfo.wmnet - jmm@cumin2002"
- 09:28 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe[2009-2020].codfw.wmnet} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
- 09:28 Emperor: roll-restart codfw ms frontends prior to pooling new ones T416243
- 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host durum4003.ulsfo.wmnet
- 09:23 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:23 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host tcp-proxy4003.ulsfo.wmnet
- 09:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host durum4003.ulsfo.wmnet
- 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow4002.ulsfo.wmnet
- 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow4002.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow4002.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:51 slyngshede@dns1004: END - running authdns-update
- 08:50 slyngshede@dns1004: START - running authdns-update
- 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:32 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts netflow4002.ulsfo.wmnet
- 08:25 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - T419712
- 08:23 arnaudb@dns1004: END - running authdns-update
- 08:21 arnaudb@dns1004: START - running authdns-update
- 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4004.ulsfo.wmnet
- 07:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncredir4004.ulsfo.wmnet
- 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ncredir4003.ulsfo.wmnet
- 07:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ncredir4003.ulsfo.wmnet
- 05:24 kart_: staging: machinetranslation: Optimize model loading and memory footprints (T411058)
- 05:19 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 05:16 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 02:16 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2005.codfw.wmnet with OS trixie
- 02:09 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 14s)
- 02:03 swfrench-wmf: reprepro include php-igbinary_3.2.16-4+icu72+wmf11u1 into component/php83-icu72 - T419058
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:59 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2005.codfw.wmnet with reason: host reimage
- 01:52 jasmine@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2005.codfw.wmnet with reason: host reimage
- 01:49 swfrench-wmf: reprepro include php-msgpack_3.0.0-1+icu72+wmf11u1 into component/php83-icu72 - T419058
- 01:47 swfrench-wmf: reprepro include php-apcu_5.1.24-1+icu72+wmf11u1 into component/php83-icu72 - T419058
- 01:37 jasmine@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2005.codfw.wmnet with OS trixie
- 01:36 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2004.codfw.wmnet with OS trixie
- 01:24 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7012.*
- 01:20 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 01:18 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2004.codfw.wmnet with reason: host reimage
- 01:18 rzl@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 01:15 jasmine@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2004.codfw.wmnet with reason: host reimage
- 01:13 swfrench-wmf: reprepro include dh-php_5.5+icu72+wmf11u1 into component/php83-icu72 - T419058
- 01:08 swfrench-wmf: reprepro include php-defaults_94+icu72+wmf11u1 into component/php83-icu72 - T419058
- 01:05 rzl@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 01:05 rzl@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 01:03 swfrench-wmf: reprepro include php8.3_8.3.30-1+icu72+wmf11u1 into component/php83-icu72 - T419058
- 01:00 jasmine@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2004.codfw.wmnet with OS trixie
- 00:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7012.magru.wmnet with OS trixie
- 00:59 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 00:58 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 00:38 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
- 00:38 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
- 00:37 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
- 00:37 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
- 00:36 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
- 00:36 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
- 00:33 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: sync
- 00:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7012.magru.wmnet with reason: host reimage
- 00:27 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: sync
- 00:24 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7012.magru.wmnet with reason: host reimage
- 00:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7012.magru.wmnet with OS trixie
2026-03-11
- 23:56 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7009.*
- 22:52 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
- 22:52 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
- 22:45 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 22:44 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 22:29 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 22:29 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 22:27 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7009.magru.wmnet with OS trixie
- 21:56 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
- 21:55 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
- 21:54 jforrester@deploy2002: Finished scap sync-world: Backport for OrchestratorRequest: Switch evaluations to v2 endpoint (T413727) (duration: 18m 19s)
- 21:47 jforrester@deploy2002: jforrester: Continuing with sync
- 21:43 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 21:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7009.magru.wmnet with reason: host reimage
- 21:42 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 21:40 jforrester@deploy2002: jforrester: Backport for OrchestratorRequest: Switch evaluations to v2 endpoint (T413727) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7009.magru.wmnet with reason: host reimage
- 21:35 jforrester@deploy2002: Started scap sync-world: Backport for OrchestratorRequest: Switch evaluations to v2 endpoint (T413727)
- 21:30 rzl: rzl@apt1002:~$ sudo -i reprepro -C component/envoy-future include bullseye-wikimedia /home/rzl/envoyproxy_1.35.9-1_amd64.changes
- 21:29 arlolra@deploy2002: Finished scap sync-world: Backport for Show category index when no category selected on Special:LintTemplateErrors (T417363), Show category index when no category selected on Special:LintTemplateErrors (T417363) (duration: 35m 16s)
- 21:16 arlolra@deploy2002: arlolra: Continuing with sync
- 21:15 arlolra@deploy2002: arlolra: Backport for Show category index when no category selected on Special:LintTemplateErrors (T417363), Show category index when no category selected on Special:LintTemplateErrors (T417363) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:08 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7009.magru.wmnet with OS trixie
- 21:08 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7010.*
- 21:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7010.magru.wmnet with OS trixie
- 20:54 arlolra@deploy2002: Started scap sync-world: Backport for Show category index when no category selected on Special:LintTemplateErrors (T417363), Show category index when no category selected on Special:LintTemplateErrors (T417363)
- 20:47 jsn@deploy2002: Finished scap sync-world: Backport for urwikisource: add logo, sitename and projectnamespace (T415974) (duration: 06m 55s)
- 20:43 jsn@deploy2002: anzx, jsn: Continuing with sync
- 20:42 jsn@deploy2002: anzx, jsn: Backport for urwikisource: add logo, sitename and projectnamespace (T415974) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:40 jsn@deploy2002: Started scap sync-world: Backport for urwikisource: add logo, sitename and projectnamespace (T415974)
- 20:38 jsn@deploy2002: Finished scap sync-world: Backport for riskyArticleEdits: show page descriptions (T419442), Fix Instrumentation on mobile view (T419517), ext.wikimediaEvents: Updated Test Kitchen impact test experiment (T407570) (duration: 10m 37s)
- 20:38 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ml-serve1014.eqiad.wmnet with reason: T400626
- 20:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
- 20:34 jsn@deploy2002: jsn, sfaci: Continuing with sync
- 20:34 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 20:33 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 20:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
- 20:30 jsn@deploy2002: jsn, sfaci: Backport for riskyArticleEdits: show page descriptions (T419442), Fix Instrumentation on mobile view (T419517), ext.wikimediaEvents: Updated Test Kitchen impact test experiment (T407570) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:28 aokoth@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on gitlab1003.wikimedia.org with reason: Upgrade
- 20:28 aokoth@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on gitlab2002.wikimedia.org with reason: Upgrade
- 20:27 jsn@deploy2002: Started scap sync-world: Backport for riskyArticleEdits: show page descriptions (T419442), Fix Instrumentation on mobile view (T419517), ext.wikimediaEvents: Updated Test Kitchen impact test experiment (T407570)
- 20:21 andrew@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:18 andrew@cumin2002: START - Cookbook sre.dns.netbox
- 20:17 bvibber@deploy2002: Finished scap sync-world: Backport for Revert "Fix for temp section open during slow loads on Parsoid" (T416063 T419170 T419721), Revert "Fix for temp section open during slow loads on Parsoid" (T416063 T419170 T419721) (duration: 06m 47s)
- 20:13 bvibber@deploy2002: bvibber: Continuing with sync
- 20:12 bvibber@deploy2002: bvibber: Backport for Revert "Fix for temp section open during slow loads on Parsoid" (T416063 T419170 T419721), Revert "Fix for temp section open during slow loads on Parsoid" (T416063 T419170 T419721) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:10 bvibber@deploy2002: Started scap sync-world: Backport for Revert "Fix for temp section open during slow loads on Parsoid" (T416063 T419170 T419721), Revert "Fix for temp section open during slow loads on Parsoid" (T416063 T419170 T419721)
- 19:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7010.magru.wmnet with OS trixie
- 19:54 andrew@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 19:51 andrew@cumin2002: START - Cookbook sre.dns.netbox
- 19:37 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-backup1004.eqiad.wmnet with OS trixie
- 19:01 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cp7011.magru.wmnet
- 19:01 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7011.magru.wmnet
- 18:56 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
- 18:49 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.19 refs T413810
- 18:49 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 18:49 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 18:45 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 18:45 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 18:44 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 18:44 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 18:43 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
- 18:42 brennen: 1.46.0-wmf.19 train status: no current blockers, going ahead to group1.
- 18:39 swfrench@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2332.codfw.wmnet
- 18:37 swfrench@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2332.codfw.wmnet
- 18:20 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7011.*
- 18:18 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 18:16 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-backup1004.eqiad.wmnet with OS trixie
- 18:13 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 17:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1011.eqiad.wmnet with reason: host reimage
- 17:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1010.eqiad.wmnet with reason: host reimage
- 17:52 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:52 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:48 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1011.eqiad.wmnet with reason: host reimage
- 17:47 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1010.eqiad.wmnet with reason: host reimage
- 17:46 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 17:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 17:42 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:42 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 17:38 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 17:38 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 17:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 17:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 17:37 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:36 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:36 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:36 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:35 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 17:34 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 17:32 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: sync
- 17:31 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: sync
- 17:29 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 17:28 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 17:28 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:27 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:20 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 17:19 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 17:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1011.eqiad.wmnet with reason: host reimage
- 17:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:15 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1011.eqiad.wmnet with reason: host reimage
- 17:13 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 17:12 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 17:09 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:09 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7011.magru.wmnet with OS trixie
- 17:01 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 17:00 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 16:58 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on durum4004.ulsfo.wmnet with reason: in setup
- 16:58 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on durum4003.ulsfo.wmnet with reason: in setup
- 16:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1011.eqiad.wmnet with reason: host reimage
- 16:40 root@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:40 root@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moving many things from cloudgw2002-dev to cloudgw2004-dev - root@cumin2002"
- 16:40 root@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moving many things from cloudgw2002-dev to cloudgw2004-dev - root@cumin2002"
- 16:39 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1011.eqiad.wmnet with reason: host reimage
- 16:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 16:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7011.magru.wmnet with reason: host reimage
- 16:35 root@cumin2002: START - Cookbook sre.dns.netbox
- 16:32 tappof@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus4002.ulsfo.wmnet
- 16:32 tappof@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:32 tappof@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus4002.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - tappof@cumin1003"
- 16:30 tappof@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus4002.ulsfo.wmnet decommissioned, removing all IPs except the asset tag one - tappof@cumin1003"
- 16:30 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7011.magru.wmnet with reason: host reimage
- 16:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 16:23 tappof@cumin1003: START - Cookbook sre.dns.netbox
- 16:18 tappof@cumin1003: START - Cookbook sre.hosts.decommission for hosts prometheus4002.ulsfo.wmnet
- 15:57 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:52 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7011.magru.wmnet with OS trixie
- 15:51 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - T419712
- 15:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 15:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 15:50 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:49 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 15:48 sukhe: sudo cumin -b1 -s10 "C:dnsrecursor" "run-puppet-agent --enable 'merging CR 1250576'"
- 15:48 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 15:46 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:45 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 15:43 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - T419712
- 15:39 sukhe: sudo cumin "C:dnsrecursor" "disable-puppet 'merging CR 1250576'"
- 15:35 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T419712
- 15:26 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T419712
- 15:08 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 15:08 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 15:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:53 swfrench-wmf: updated component/php83-icu72 with libpcre2 10.42-1~wmf11+1 from apt-staging - T419058
- 14:46 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 14:45 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4004.ulsfo.wmnet
- 14:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4004.ulsfo.wmnet with OS trixie
- 14:39 vgutierrez: depool ncredir4003 && ncredir4004
- 14:38 vgutierrez: repool ncredir4001 && ncredir4002
- 14:31 jmm@puppetserver1001: conftool action : set/pooled=no; selector: name=ncredir4002.ulsfo.wmnet
- 14:31 jmm@puppetserver1001: conftool action : set/pooled=no; selector: name=ncredir4001.ulsfo.wmnet
- 14:30 jmm@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir4004.ulsfo.wmnet
- 14:30 jmm@puppetserver1001: conftool action : set/weight=1; selector: name=ncredir4004.ulsfo.wmnet
- 14:27 jmm@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir4003.ulsfo.wmnet
- 14:27 jmm@puppetserver1001: conftool action : set/weight=1; selector: name=ncredir4003.ulsfo.wmnet
- 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4004.ulsfo.wmnet with reason: host reimage
- 14:23 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 14:23 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 14:19 moritzm: installing python-urllib3 security updates
- 14:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4004.ulsfo.wmnet with reason: host reimage
- 14:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 14:13 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:12 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 14:12 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:12 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 14:12 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:11 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:11 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 14:11 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 14:11 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:10 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:08 gkyziridis@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 14:08 gkyziridis@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 14:07 jdlrobson@deploy2002: Finished scap sync-world: Backport for Fix pinnableElement export (T419620) (duration: 06m 26s)
- 14:06 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:05 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:04 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:04 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:03 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:03 jdlrobson@deploy2002: jdlrobson: Continuing with sync
- 14:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:02 jdlrobson@deploy2002: jdlrobson: Backport for Fix pinnableElement export (T419620) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:00 jdlrobson@deploy2002: Started scap sync-world: Backport for Fix pinnableElement export (T419620)
- 13:58 moritzm: uploaded libxml2 2.9.10+dfsg-6.7+deb11u9+wmf11u1 to component/php83-icu72 for bullseye-wikimedia (special build of libxml with ICU disabled to ensure co-installabiliy between icu 67 and icu 72) T419058
- 13:57 jdlrobson@deploy2002: Finished scap sync-world: Backport for Restore advanced main menu for AMC (T413912) (duration: 10m 44s)
- 13:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum4004.ulsfo.wmnet with OS trixie
- 13:54 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:54 vgutierrez: repool cp7016
- 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum4004.ulsfo.wmnet - jmm@cumin2002"
- 13:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum4004.ulsfo.wmnet - jmm@cumin2002"
- 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum4004.ulsfo.wmnet on all recursors
- 13:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum4004.ulsfo.wmnet on all recursors
- 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:54 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum4004.ulsfo.wmnet - jmm@cumin2002"
- 13:51 jdlrobson@deploy2002: jdlrobson: Continuing with sync
- 13:50 jdlrobson@deploy2002: jdlrobson: Backport for Restore advanced main menu for AMC (T413912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:49 vgutierrez: depool cp7016
- 13:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum4004.ulsfo.wmnet - jmm@cumin2002"
- 13:46 jdlrobson@deploy2002: Started scap sync-world: Backport for Restore advanced main menu for AMC (T413912)
- 13:45 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:44 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:44 jdlrobson@deploy2002: Finished scap sync-world: Backport for Remove `MetricsPlatform` configuration from production (T416865) (duration: 35m 52s)
- 13:43 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 13:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir4004.ulsfo.wmnet with OS bookworm
- 13:36 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:36 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum4004.ulsfo.wmnet
- 13:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum4003.ulsfo.wmnet
- 13:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum4003.ulsfo.wmnet with OS trixie
- 13:36 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 13:35 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 13:30 jdlrobson@deploy2002: jdlrobson, sfaci: Continuing with sync
- 13:29 jdlrobson@deploy2002: jdlrobson, sfaci: Backport for Remove `MetricsPlatform` configuration from production (T416865) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4004.ulsfo.wmnet with reason: host reimage
- 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4003.ulsfo.wmnet with reason: host reimage
- 13:18 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4004.ulsfo.wmnet with reason: host reimage
- 13:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4003.ulsfo.wmnet with reason: host reimage
- 13:08 jdlrobson@deploy2002: Started scap sync-world: Backport for Remove `MetricsPlatform` configuration from production (T416865)
- 13:00 moritzm: installing libcommons-lang3-java security updates
- 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir4004.ulsfo.wmnet with OS bookworm
- 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir4003.ulsfo.wmnet with OS bookworm
- 12:46 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host durum4003.ulsfo.wmnet with OS trixie
- 12:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum4003.ulsfo.wmnet - jmm@cumin2002"
- 12:46 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum4003.ulsfo.wmnet - jmm@cumin2002"
- 12:46 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum4003.ulsfo.wmnet on all recursors
- 12:45 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache durum4003.ulsfo.wmnet on all recursors
- 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum4003.ulsfo.wmnet - jmm@cumin2002"
- 12:41 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum4003.ulsfo.wmnet - jmm@cumin2002"
- 12:37 moritzm: installing inetutils security updates
- 12:36 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:36 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host durum4003.ulsfo.wmnet
- 12:35 tappof: completed migration from prometheus4002 to prometheus4003 (ulsfo) (TT419430)
- 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4003.ulsfo.wmnet with reason: host reimage
- 12:28 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4003.ulsfo.wmnet with reason: host reimage
- 12:28 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 12:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2073.codfw.wmnet with OS bullseye
- 12:23 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4003.ulsfo.wmnet
- 12:18 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1011.eqiad.wmnet with OS bookworm
- 12:17 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1011
- 12:17 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1011
- 12:14 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2072.codfw.wmnet with OS bullseye
- 12:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4003.ulsfo.wmnet
- 12:05 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2073.codfw.wmnet with reason: host reimage
- 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir4003.ulsfo.wmnet with OS bookworm
- 12:01 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1010
- 11:59 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1010
- 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2073.codfw.wmnet with reason: host reimage
- 11:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2072.codfw.wmnet with reason: host reimage
- 11:48 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2072.codfw.wmnet with reason: host reimage
- 11:41 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] Enable on every new Wikipedia by default (T304052) (duration: 06m 39s)
- 11:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2073
- 11:38 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2073
- 11:37 vgutierrez: upgrading to acme-chief 0.39 on acme-chief production instances - T419352
- 11:37 urbanecm@deploy2002: urbanecm: Continuing with sync
- 11:36 urbanecm@deploy2002: urbanecm: Backport for [Growth] Enable on every new Wikipedia by default (T304052) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:36 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2073
- 11:36 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2073.codfw.wmnet 212.48.192.10.in-addr.arpa 2.1.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:36 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2073.codfw.wmnet 212.48.192.10.in-addr.arpa 2.1.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:36 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:36 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2073 - mvernon@cumin2002"
- 11:36 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2073 - mvernon@cumin2002"
- 11:35 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 11:34 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 11:34 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] Enable on every new Wikipedia by default (T304052)
- 11:34 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 11:34 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] kaiwiki: Enable GrowthExperiments (T304052) (duration: 14m 11s)
- 11:33 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 11:33 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 11:32 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 11:32 mvernon@cumin2002: START - Cookbook sre.dns.netbox
- 11:31 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2073
- 11:30 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2073.codfw.wmnet with OS bullseye
- 11:30 urbanecm@deploy2002: urbanecm: Continuing with sync
- 11:29 cgoubert@dns1004: END - running authdns-update
- 11:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2072
- 11:29 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2072
- 11:28 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2072
- 11:28 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2072.codfw.wmnet 158.32.192.10.in-addr.arpa 8.5.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:28 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2072.codfw.wmnet 158.32.192.10.in-addr.arpa 8.5.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:28 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:28 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2072 - mvernon@cumin2002"
- 11:28 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2072 - mvernon@cumin2002"
- 11:28 cgoubert@dns1004: START - running authdns-update
- 11:26 urbanecm@deploy2002: mwscript-k8s job started: WikimediaMaintenance:createExtensionTables.php --wiki=kaiwiki growthexperiments # T304052
- 11:24 mvernon@cumin2002: START - Cookbook sre.dns.netbox
- 11:24 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2072
- 11:23 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2072.codfw.wmnet with OS bullseye
- 11:22 tappof@dns1004: END - running authdns-update
- 11:22 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 11:21 urbanecm@deploy2002: urbanecm: Backport for [Growth] kaiwiki: Enable GrowthExperiments (T304052) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:21 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 11:21 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
- 11:21 tappof@dns1004: START - running authdns-update
- 11:21 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
- 11:19 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] kaiwiki: Enable GrowthExperiments (T304052)
- 11:19 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 11:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2071.codfw.wmnet with OS bullseye
- 11:18 urbanecm@deploy2002: mwscript-k8s job started: WikimediaMaintenance:createExtensionTables.php --wiki=kaiwiki growthexperiments # T304052
- 11:10 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
- 11:10 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
- 11:08 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 11:08 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 11:05 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 11:05 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 10:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2071.codfw.wmnet with reason: host reimage
- 10:54 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2071.codfw.wmnet with reason: host reimage
- 10:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2071
- 10:35 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2071
- 10:34 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2071
- 10:34 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2071.codfw.wmnet 221.16.192.10.in-addr.arpa 1.2.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 10:34 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2071.codfw.wmnet 221.16.192.10.in-addr.arpa 1.2.2.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 10:34 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:34 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2071 - mvernon@cumin2002"
- 10:34 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2071 - mvernon@cumin2002"
- 10:26 mvernon@cumin2002: START - Cookbook sre.dns.netbox
- 10:24 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2071
- 10:23 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2071.codfw.wmnet with OS bullseye
- 10:08 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2095.codfw.wmnet with OS bullseye
- 10:03 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Failed step after ml-serve1015's reimage - elukey@cumin1003"
- 10:02 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Failed step after ml-serve1015's reimage - elukey@cumin1003"
- 10:01 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1015.eqiad.wmnet with OS trixie
- 10:01 elukey@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
- 09:59 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2096.codfw.wmnet with OS bullseye
- 09:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2096.codfw.wmnet with OS bullseye
- 09:52 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
- 09:51 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
- 09:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2095.codfw.wmnet with OS bullseye
- 09:51 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
- 09:46 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
- 09:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 09:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 09:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
- 09:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
- 09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply
- 09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply
- 09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 09:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 09:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2096.codfw.wmnet with reason: host reimage
- 09:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 09:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
- 09:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
- 09:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 09:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 09:28 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2096.codfw.wmnet with reason: host reimage
- 09:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2095.codfw.wmnet with reason: host reimage
- 09:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 09:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 09:24 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2095.codfw.wmnet with reason: host reimage
- 09:22 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir4004.ulsfo.wmnet
- 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir4004.ulsfo.wmnet with OS bookworm
- 09:15 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:15 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:14 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
- 09:10 javiermonton@deploy2002: Finished scap sync-world: Backport for stream: mediawiki.page_html_content_change (T419258) (duration: 08m 28s)
- 09:07 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2096.codfw.wmnet with OS bullseye
- 09:06 javiermonton@deploy2002: javiermonton: Continuing with sync
- 09:03 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2095.codfw.wmnet with OS bullseye
- 09:03 javiermonton@deploy2002: javiermonton: Backport for stream: mediawiki.page_html_content_change (T419258) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4004.ulsfo.wmnet with reason: host reimage
- 09:01 javiermonton@deploy2002: Started scap sync-world: Backport for stream: mediawiki.page_html_content_change (T419258)
- 08:59 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1015.eqiad.wmnet with reason: host reimage
- 08:58 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/SERVICE_NAME: apply
- 08:58 trueg@deploy2002: helmfile [staging] START helmfile.d/services/SERVICE_NAME: apply
- 08:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4004.ulsfo.wmnet with reason: host reimage
- 08:55 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2239.codfw.wmnet with reason: mysql upgrade / restart
- 08:54 moritzm: installing imagemagick security updates
- 08:52 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1015.eqiad.wmnet with reason: host reimage
- 08:41 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1015.eqiad.wmnet with OS trixie
- 08:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1014.eqiad.wmnet with OS trixie
- 08:40 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
- 08:39 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
- 08:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir4004.ulsfo.wmnet with OS bookworm
- 08:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir4004.ulsfo.wmnet - jmm@cumin2002"
- 08:32 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir4004.ulsfo.wmnet - jmm@cumin2002"
- 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir4004.ulsfo.wmnet on all recursors
- 08:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir4004.ulsfo.wmnet on all recursors
- 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir4004.ulsfo.wmnet - jmm@cumin2002"
- 08:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir4004.ulsfo.wmnet - jmm@cumin2002"
- 08:23 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1014.eqiad.wmnet with reason: host reimage
- 08:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:21 Msz2001: UTC morning backport window finished
- 08:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:21 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir4004.ulsfo.wmnet
- 08:21 mszwarc@deploy2002: Finished scap sync-world: Backport for Drop underscore from titles in wgOATH2FARequiredGroupRemovalPages (duration: 10m 46s)
- 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir4003.ulsfo.wmnet
- 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir4003.ulsfo.wmnet with OS bookworm
- 08:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1014.eqiad.wmnet with reason: host reimage
- 08:15 mszwarc@deploy2002: mszwarc: Continuing with sync
- 08:14 mszwarc@deploy2002: mszwarc: Backport for Drop underscore from titles in wgOATH2FARequiredGroupRemovalPages synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:10 mszwarc@deploy2002: Started scap sync-world: Backport for Drop underscore from titles in wgOATH2FARequiredGroupRemovalPages
- 08:09 mszwarc@deploy2002: Finished scap sync-world: Backport for Display list of 2FA-req. groups on AccountSecurity for 2FA-less users (T419422), Send2FAWarningNotifications: Support reading users from file (T419111) (duration: 33m 07s)
- 08:05 moritzm: installing mariadb bugfix updates from Bookworm point release (tools and libraries as packaged in Debian, unrelated to the wmf-mariadb packages)
- 08:04 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1014.eqiad.wmnet with OS trixie
- 08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4003.ulsfo.wmnet with reason: host reimage
- 07:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4003.ulsfo.wmnet with reason: host reimage
- 07:57 mszwarc@deploy2002: mszwarc: Continuing with sync
- 07:56 mszwarc@deploy2002: mszwarc: Backport for Display list of 2FA-req. groups on AccountSecurity for 2FA-less users (T419422), Send2FAWarningNotifications: Support reading users from file (T419111) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1049.eqiad.wmnet
- 07:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1049.eqiad.wmnet
- 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
- 07:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
- 07:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ncredir4003.ulsfo.wmnet with OS bookworm
- 07:36 mszwarc@deploy2002: Started scap sync-world: Backport for Display list of 2FA-req. groups on AccountSecurity for 2FA-less users (T419422), Send2FAWarningNotifications: Support reading users from file (T419111)
- 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir4003.ulsfo.wmnet - jmm@cumin2002"
- 07:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir4003.ulsfo.wmnet - jmm@cumin2002"
- 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir4003.ulsfo.wmnet on all recursors
- 07:34 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ncredir4003.ulsfo.wmnet on all recursors
- 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir4003.ulsfo.wmnet - jmm@cumin2002"
- 07:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir4003.ulsfo.wmnet - jmm@cumin2002"
- 07:27 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ncredir4003.ulsfo.wmnet
- 07:22 kgraessle@deploy2002: Finished scap sync-world: Backport for Enable rr-ml AutoModerator CC Set AutoModeratorMultiLingualRevertRisk with available wikis (T400727) (duration: 12m 24s)
- 07:18 kgraessle@deploy2002: kgraessle: Continuing with sync
- 07:12 kgraessle@deploy2002: kgraessle: Backport for Enable rr-ml AutoModerator CC Set AutoModeratorMultiLingualRevertRisk with available wikis (T400727) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:09 kgraessle@deploy2002: Started scap sync-world: Backport for Enable rr-ml AutoModerator CC Set AutoModeratorMultiLingualRevertRisk with available wikis (T400727)
- 02:08 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 07m 59s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:33 zabe@deploy2002: Finished scap sync-world: Backport for Stop setting $wgImageLinksSchemaMigrationStage (T299953) (duration: 09m 38s)
- 00:29 zabe@deploy2002: zabe: Continuing with sync
- 00:26 zabe@deploy2002: zabe: Backport for Stop setting $wgImageLinksSchemaMigrationStage (T299953) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:24 zabe@deploy2002: Started scap sync-world: Backport for Stop setting $wgImageLinksSchemaMigrationStage (T299953)
- 00:03 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2095.codfw.wmnet with reason: host reimage
- 00:03 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host contint1003.wikimedia.org with OS trixie
- 00:03 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
- 00:03 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
2026-03-10
- 23:58 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2095.codfw.wmnet with reason: host reimage
- 23:53 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2096.codfw.wmnet with reason: host reimage
- 23:49 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2096.codfw.wmnet with reason: host reimage
- 23:44 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on contint1003.wikimedia.org with reason: host reimage
- 23:40 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on contint1003.wikimedia.org with reason: host reimage
- 23:31 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be2096.codfw.wmnet with OS bullseye
- 23:31 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be2095.codfw.wmnet with OS bullseye
- 23:26 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2095.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:24 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2096.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:22 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host contint1003.wikimedia.org with OS trixie
- 23:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host contint1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:12 vriley@cumin1003: START - Cookbook sre.hosts.provision for host contint1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:11 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2096.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:05 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 23:05 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 22:59 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-be2095.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:39 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 22:38 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 21:51 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7012.magru.wmnet with OS trixie
- 21:48 Dreamy_Jazz: Evening UTC backport window done
- 21:42 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp7006.magru.wmnet [reason: trixie reimaging]
- 21:42 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7006.magru.wmnet with OS trixie
- 21:25 tgr@deploy2002: Finished scap sync-world: Backport for Migrate EmailAuth, step 2 (T404334) (duration: 25m 34s)
- 21:24 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp7007.magru.wmnet [reason: trixie reimaging]
- 21:22 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7007.magru.wmnet with OS trixie
- 21:21 tgr@deploy2002: tgr: Continuing with sync
- 21:13 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7006.magru.wmnet with reason: host reimage
- 21:09 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7006.magru.wmnet with reason: host reimage
- 21:02 tgr@deploy2002: tgr: Backport for Migrate EmailAuth, step 2 (T404334) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:00 tgr@deploy2002: Started scap sync-world: Backport for Migrate EmailAuth, step 2 (T404334)
- 20:59 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7012.magru.wmnet with OS trixie
- 20:54 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
- {{safesubst:SAL entry|1=20:50 jforrester@deploy2002: Finished scap sync-world: Backport for Deploy participant recruitment survey on ptwiki and trwiki (T419275), wikifunctions: Drop temporary WikifunctionsEnableHTMLOutput flag (T397402), wikifunctions: Drop temporary WikifunctionsEnableWikidataInputTypes flag (T397403), [[gerrit:1249393|build: Upgrade mediawiki-phan-config from 0.18.0 to 0.2}}
- 20:48 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
- 20:46 jforrester@deploy2002: dani, jforrester: Continuing with sync
- {{safesubst:SAL entry|1=20:45 jforrester@deploy2002: dani, jforrester: Backport for Deploy participant recruitment survey on ptwiki and trwiki (T419275), wikifunctions: Drop temporary WikifunctionsEnableHTMLOutput flag (T397402), wikifunctions: Drop temporary WikifunctionsEnableWikidataInputTypes flag (T397403), [[gerrit:1249393|build: Upgrade mediawiki-phan-config from 0.18.0 to 0.20.0 (T41}}
- 20:43 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS trixie
- {{safesubst:SAL entry|1=20:43 jforrester@deploy2002: Started scap sync-world: Backport for Deploy participant recruitment survey on ptwiki and trwiki (T419275), wikifunctions: Drop temporary WikifunctionsEnableHTMLOutput flag (T397402), wikifunctions: Drop temporary WikifunctionsEnableWikidataInputTypes flag (T397403), [[gerrit:1249393|build: Upgrade mediawiki-phan-config from 0.18.0 to 0.20}}
- 20:42 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7006.magru.wmnet with OS trixie
- 20:42 cdobbins@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cdobbins@cumin2002"
- 20:38 jforrester@deploy2002: Finished scap sync-world: Backport for Enable personal main menu to all users in Minerva Neue skin (T413912), Enables legacy processing in ParserOutputPostCacheTransform when cached (T372592), Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode (T416616 T416540 T419439) (duration: 12m 58s)
- 20:36 cdobbins@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cdobbins@cumin2002"
- 20:34 jforrester@deploy2002: jforrester, cscott, bwang: Continuing with sync
- 20:27 jforrester@deploy2002: jforrester, cscott, bwang: Backport for Enable personal main menu to all users in Minerva Neue skin (T413912), Enables legacy processing in ParserOutputPostCacheTransform when cached (T372592), Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode (T416616 T416540 T419439) synced to the testservers (see https://wikitech.wi
- 20:25 jforrester@deploy2002: Started scap sync-world: Backport for Enable personal main menu to all users in Minerva Neue skin (T413912), Enables legacy processing in ParserOutputPostCacheTransform when cached (T372592), Parser: Raise minimum TTL from 30 min to 'next midnight' in miser mode (T416616 T416540 T419439)
- 20:25 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS trixie
- 20:24 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp7007.magru.wmnet [reason: trixie reimaging]
- 20:24 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp7005.magru.wmnet [reason: trixie reimaging]
- 20:16 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7005.magru.wmnet with OS trixie
- 20:10 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7006.magru.wmnet with reason: host reimage
- 20:03 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7013.*
- 20:03 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7006.magru.wmnet with reason: host reimage
- 19:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7013.magru.wmnet with OS trixie
- 19:49 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7005.magru.wmnet with reason: host reimage
- 19:42 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7005.magru.wmnet with reason: host reimage
- 19:40 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS trixie
- 19:40 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp7006.magru.wmnet [reason: trixie reimaging]
- 19:39 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp7004.magru.wmnet [reason: trixie reimaging]
- 19:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7013.magru.wmnet with reason: host reimage
- 19:19 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp7005.magru.wmnet with OS trixie
- 19:19 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7004.magru.wmnet with OS trixie
- 19:19 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp7005.magru.wmnet [reason: trixie reimaging]
- 19:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7013.magru.wmnet with reason: host reimage
- 19:17 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp7003.magru.wmnet [reason: trixie reimaging]
- 19:16 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.19 refs T413810
- 19:16 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7003.magru.wmnet with OS trixie
- 19:09 brennen: 1.46.0-wmf.19 train status: blockers believed resolved, rolling to group0
- 19:07 brennen@deploy2002: Finished scap sync-world: Backport for Re-add correct namespace for translatable pages (T419294) (duration: 12m 30s)
- 19:06 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host contint1003.wikimedia.org with OS trixie
- 19:01 brennen@deploy2002: abi, brennen: Continuing with sync
- 18:58 brennen@deploy2002: abi, brennen: Backport for Re-add correct namespace for translatable pages (T419294) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:55 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7013.magru.wmnet with OS trixie
- 18:54 brennen@deploy2002: Started scap sync-world: Backport for Re-add correct namespace for translatable pages (T419294)
- 18:52 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: host reimage
- 18:52 brennen@deploy2002: Finished scap sync-world: testwikis to 1.46.0-wmf.19 refs T413810 (duration: 38m 34s)
- 18:49 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7004.magru.wmnet with reason: host reimage
- 18:47 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
- 18:44 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
- 18:41 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7015.*
- 18:27 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7015.magru.wmnet with OS trixie
- 18:23 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS trixie
- 18:21 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp7004.magru.wmnet [reason: trixie reimaging]
- 18:16 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp7003.magru.wmnet with OS trixie
- 18:13 brennen@deploy2002: Started scap sync-world: testwikis to 1.46.0-wmf.19 refs T413810
- 18:13 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp7003.magru.wmnet [reason: trixie reimaging]
- 18:13 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp7003.magru.wmnet [reason: trixie reimaging]
- 18:00 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 17:59 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 17:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7015.magru.wmnet with reason: host reimage
- 17:59 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 17:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7015.magru.wmnet with reason: host reimage
- 17:54 hashar@deploy2002: Finished deploy [integration/docroot@f544f49]: Catch up with composer/npm dev dependencies. Noop for production (duration: 00m 11s)
- 17:54 hashar@deploy2002: Started deploy [integration/docroot@f544f49]: Catch up with composer/npm dev dependencies. Noop for production
- 17:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 17:32 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 17:32 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 17:31 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 17:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS trixie
- 17:30 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 17:29 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 17:29 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 17:28 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 17:28 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host contint1003.wikimedia.org with OS trixie
- 17:28 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 17:26 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host contint1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:24 vriley@cumin1003: START - Cookbook sre.hosts.provision for host contint1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:23 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host contint1003.wikimedia.org with OS trixie
- 17:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 17:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 17:12 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 17:12 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 17:11 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 17:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 17:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 17:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 17:01 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host contint1003.wikimedia.org with OS trixie
- 16:40 andrew@dns1004: END - running authdns-update
- 16:38 andrew@dns1004: START - running authdns-update
- 16:25 reedy@deploy2002: Finished scap sync-world: Backport for Revert "CommonSettings: Remove orphaned $wgWebAuthnNewCredsDisabled" (duration: 07m 45s)
- 16:21 reedy@deploy2002: reedy: Continuing with sync
- 16:19 reedy@deploy2002: reedy: Backport for Revert "CommonSettings: Remove orphaned $wgWebAuthnNewCredsDisabled" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:17 reedy@deploy2002: Started scap sync-world: Backport for Revert "CommonSettings: Remove orphaned $wgWebAuthnNewCredsDisabled"
- 15:59 jynus@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 15:59 taavi: update cr firewall policy for codfw1dev ldap tree https://gerrit.wikimedia.org/r/1249985
- 15:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-fr-tech: apply
- 15:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-fr-tech: apply
- 15:55 jynus@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 15:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:39 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:34 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 15:28 brouberol@dns1004: END - running authdns-update
- 15:27 brouberol@dns1004: START - running authdns-update
- 15:10 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy: Fix an edge case in validation of a new object - swfrench@cumin2002"
- 15:10 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Fix an edge case in validation of a new object - swfrench@cumin2002
- 15:09 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Fix an edge case in validation of a new object - swfrench@cumin2002
- 15:09 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy: Fix an edge case in validation of a new object - swfrench@cumin2002"
- 15:05 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:59 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:58 sukhe: sudo cumin -b1 -s15 "C:bird" "run-puppet-agent --enable 'merging CR 1238007; add function return type'"
- 14:58 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:58 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:58 sukhe: sudo cumin -b1 -s15 "C:bird" "run-puppet-agent 'merging CR 1238007; add function return type'"
- 14:51 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:44 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:42 sukhe: sudo cumin "C:bird" "disable-puppet 'merging CR 1238007; add function return type'"
- 14:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host ml-serve1014
- 14:39 elukey@cumin1003: START - Cookbook sre.hosts.provision for host thanos-be1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:36 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host ml-serve1014
- 14:36 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.powercycle (exit_code=99) for host ml-serve1014
- 14:36 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host ml-serve1014
- 14:12 otto@deploy2002: Finished scap sync-world: Backport for stream: mediawiki.page_edit_type_simple.dev0 (T351225) (duration: 11m 05s)
- 14:08 otto@deploy2002: akhatun, otto: Continuing with sync
- 14:02 otto@deploy2002: akhatun, otto: Backport for stream: mediawiki.page_edit_type_simple.dev0 (T351225) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:01 otto@deploy2002: Started scap sync-world: Backport for stream: mediawiki.page_edit_type_simple.dev0 (T351225)
- 13:49 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host contint1003.wikimedia.org with OS trixie
- 13:43 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 13:42 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich: apply
- 13:28 vgutierrez: testing acme-chief 0.39 in acmechief-test2001 - T419352
- 13:27 vgutierrez: upload acme-chief 0.39 to bookworm-wikimedia (apt.wm.o) - T419352
- 13:16 jiji@cumin1003: END (FAIL) - Cookbook sre.memcached.roll-reboot-restart (exit_code=1) rolling restart_daemons on A:memcached-canary
- 13:16 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling restart_daemons on A:memcached-canary
- 13:12 mszwarc@deploy2002: Finished scap sync-world: Backport for Require 2FA from 6 other user groups ($wgRestrictedGroups) (T418580), kaiwiki: add logo, stiename, projectnamespace and timezone (T414237) (duration: 08m 45s)
- 13:08 mszwarc@deploy2002: mszwarc, anzx: Continuing with sync
- 13:05 mszwarc@deploy2002: mszwarc, anzx: Backport for Require 2FA from 6 other user groups ($wgRestrictedGroups) (T418580), kaiwiki: add logo, stiename, projectnamespace and timezone (T414237) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:03 mszwarc@deploy2002: Started scap sync-world: Backport for Require 2FA from 6 other user groups ($wgRestrictedGroups) (T418580), kaiwiki: add logo, stiename, projectnamespace and timezone (T414237)
- 13:03 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling reboot on A:memcached-gutter-eqiad
- 12:57 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1015.eqiad.wmnet with OS bookworm
- 12:56 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling reboot on A:memcached-gutter-eqiad
- 12:51 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1014.eqiad.wmnet with OS bookworm
- 12:50 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve1014
- 12:50 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve1014
- 12:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:49 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:49 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:49 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:48 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:48 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:48 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:48 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:47 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:45 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:44 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:42 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:42 jiji@cumin1003: END (PASS) - Cookbook sre.memcached.roll-reboot-restart (exit_code=0) rolling restart_daemons on A:memcached-canary
- 12:42 jiji@cumin1003: START - Cookbook sre.memcached.roll-reboot-restart rolling restart_daemons on A:memcached-canary
- 12:31 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1095.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 12:10 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be1095.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 12:10 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1095.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 11:59 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-be1095.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 11:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2024.codfw.wmnet with OS bullseye
- 11:34 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1003"
- 11:17 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - ayounsi@cumin1003"
- 11:15 Emperor: rebalance codfw swift rings T354872
- 10:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2024.codfw.wmnet with reason: host reimage
- 10:47 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2024.codfw.wmnet with reason: host reimage
- 10:31 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe2024.codfw.wmnet with OS bullseye
- 10:30 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-fe2024.codfw.wmnet with OS bullseye
- 10:20 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe2024.codfw.wmnet with OS bullseye
- 10:17 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 09:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqdfw
- 09:31 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device cr2-eqdfw
- 09:22 derick@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki TMPRI1975 FondueFanatic # T419499
- 09:00 arnaudb@dns1005: END - running authdns-update
- 09:00 godog: restore all host interfaces - T417393
- 08:58 arnaudb@dns1005: START - running authdns-update
- 08:30 godog: disabled interface for cloudcephmon1004 - T417393
- 08:22 godog: disabled interfaces for cloudcephosd1021 cloudcephosd1042 cloudcephosd1043 cloudcephosd1018 cloudcephosd1022 - T417393
- 08:18 godog: disabled interfaces for cloudcephosd1016 cloudcephosd1017 cloudcephosd1016 cloudcephosd1018 cloudcephosd1017 cloudcephosd1035 - T417393
- 08:05 godog: start disabling cloudcephosd interfaces - T417393
- 07:49 godog: prep cloudsw reboot tests 'ceph osd set noout' - T417393
- 07:41 filippo@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 19 hosts with reason: switch down tests
- 06:14 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2009.codfw.wmnet with OS bookworm
- 04:09 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
- 04:08 pt1979@cumin2002: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
- 04:01 mwpresync@deploy2002: Pruned MediaWiki: 1.46.0-wmf.16 (duration: 01m 48s)
- 02:09 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 10s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:37 ryankemper: [WDQS] T410573 repooled wdqs1011.eqiad.wmnet - erroneously depooled since `2025-11-19` by failed `sre.wdqs.reboot` cookbook
- 00:42 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host contint1003.wikimedia.org with OS trixie
- 00:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host contint1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 00:29 vriley@cumin1003: START - Cookbook sre.hosts.provision for host contint1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
2026-03-09
- 22:51 rzl: root@apt1002:~# reprepro --noskipold --restrict vopsbot update bookworm-wikimedia
- 22:34 bking@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dse-k8s-ctrl1001.eqiad.wmnet
- 22:32 bking@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dse-k8s-ctrl1001.eqiad.wmnet
- 22:30 bking@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dse-k8s-ctrl1002.eqiad.wmnet
- 22:29 bking@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dse-k8s-ctrl1002.eqiad.wmnet
- 22:28 bking@cumin2002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM dse-k8s-ctrl1002.eqiad.wmnet
- 22:28 bking@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dse-k8s-ctrl1002.eqiad.wmnet
- 22:28 bking@cumin2002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM dse-k8s-ctrl1002.eqiad.wmnet
- 22:28 bking@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM dse-k8s-ctrl1002.eqiad.wmnet
- 22:03 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2004-dev.codfw.wmnet with OS trixie
- 22:02 alexsanford: Redeployed security fix for T419186
- 21:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2004-dev.codfw.wmnet with reason: host reimage
- 21:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2004-dev.codfw.wmnet with reason: host reimage
- 21:37 cdobbins@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7002.magru.wmnet
- 21:34 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7002.magru.wmnet with OS trixie
- 21:29 alexsanford: Deployed security fix for T419186
- 21:22 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2004-dev.codfw.wmnet with OS trixie
- 21:21 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudgw2004-dev.codfw.wmnet with OS trixie
- 21:17 dani@deploy2002: Finished scap sync-world: Backport for Pre-deploy participant recruitment survey on ptwiki and trwiki (T419275) (duration: 08m 15s)
- 21:13 dani@deploy2002: dani: Continuing with sync
- 21:11 dani@deploy2002: dani: Backport for Pre-deploy participant recruitment survey on ptwiki and trwiki (T419275) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:09 dani@deploy2002: Started scap sync-world: Backport for Pre-deploy participant recruitment survey on ptwiki and trwiki (T419275)
- 21:08 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2004-dev.codfw.wmnet with reason: host reimage
- 21:05 cdobbins@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cp7002.magru.wmnet with reason: host reimage
- 21:02 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7002.magru.wmnet with reason: host reimage
- 21:01 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2004-dev.codfw.wmnet with reason: host reimage
- 21:01 tgr_: removed private code for T397244
- 21:01 ryankemper: [WDQS] Alright, these are re-entering a failed state soon enough that we will need to identify the offender if we want to restore proper service. We could put some temporary hack to restart every few minutes so we at least maintain some uptime, but root cause is the usual 'we need a requestctl rule to block whoever's killing us' scenario
- 21:00 cdobbins@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet [reason: Trixie reimaging]
- 20:57 ryankemper: [WDQS] Auto-remediation would have eventually restarted these, but some of them were staying below our current threshold of `threads > 1200`. May want to lower threshold, or examine an additional metric-type to look at in the future
- 20:56 ryankemper: [WDQS] `ryankemper@cumin2002:~$ sudo -E cumin 'A:wdqs-main AND P{wdqs1*}' 'systemctl restart wdqs-blazegraph'`
- 20:54 ryankemper: [WDQS] `ryankemper@cumin2002:~$ sudo -E cumin 'A:wdqs-main AND P{wdqs2*}' 'systemctl restart wdqs-blazegraph'`
- 20:44 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2004-dev.codfw.wmnet with OS trixie
- 20:43 tgr@deploy2002: Unlocked for deployment [MediaWiki]: working on private change (duration: 10m 10s)
- 20:36 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS trixie
- 20:33 tgr@deploy2002: Locking from deployment [MediaWiki]: working on private change
- 20:31 tgr@deploy2002: Finished scap sync-world: Backport for Enable parser survey for opted-out users on German/French/Polish wikis (T414852), lift IP cap for womens month editathon (T419109) (duration: 13m 36s)
- 20:27 tgr@deploy2002: cscott, tgr, anzx: Continuing with sync
- 20:19 tgr@deploy2002: cscott, tgr, anzx: Backport for Enable parser survey for opted-out users on German/French/Polish wikis (T414852), lift IP cap for womens month editathon (T419109) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:17 tgr@deploy2002: Started scap sync-world: Backport for Enable parser survey for opted-out users on German/French/Polish wikis (T414852), lift IP cap for womens month editathon (T419109)
- 20:13 aaron@deploy2002: Finished scap sync-world: Backport for Remove redundant math spec file from wwwportal (T418188) (duration: 06m 56s)
- 20:09 aaron@deploy2002: aaron: Continuing with sync
- 20:08 aaron@deploy2002: aaron: Backport for Remove redundant math spec file from wwwportal (T418188) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:06 aaron@deploy2002: Started scap sync-world: Backport for Remove redundant math spec file from wwwportal (T418188)
- 20:01 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7016.*
- 19:54 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7001.magru.wmnet with OS trixie
- 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7016.magru.wmnet with OS trixie
- 19:49 zabe@deploy2002: Finished scap sync-world: Backport for Stop writing to il_to on commonswiki (T415787) (duration: 06m 04s)
- 19:45 zabe@deploy2002: zabe: Continuing with sync
- 19:44 zabe@deploy2002: zabe: Backport for Stop writing to il_to on commonswiki (T415787) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:43 zabe@deploy2002: Started scap sync-world: Backport for Stop writing to il_to on commonswiki (T415787)
- 19:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 19:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 19:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7001.magru.wmnet with reason: host reimage
- 19:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7016.magru.wmnet with reason: host reimage
- 19:23 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7001.magru.wmnet with reason: host reimage
- 19:19 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7016.magru.wmnet with reason: host reimage
- 19:15 cwhite@deploy2002: Finished deploy [performance/arc-lamp@aa8da8b]: Ie7e035 (duration: 00m 08s)
- 19:15 cwhite@deploy2002: Started deploy [performance/arc-lamp@aa8da8b]: Ie7e035
- 19:14 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 19:14 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 19:05 herron@deploy2002: Finished scap sync-world: Backport for udp2log: switch to new hosts (duration: 09m 38s)
- 19:03 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:03 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:01 herron@deploy2002: herron: Continuing with sync
- 19:00 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:00 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 18:59 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 18:59 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:59 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 18:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 18:57 herron@deploy2002: herron: Backport for udp2log: switch to new hosts synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:57 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp7001.magru.wmnet with OS trixie
- 18:55 herron@deploy2002: Started scap sync-world: Backport for udp2log: switch to new hosts
- 18:55 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7016.magru.wmnet with OS trixie
- 18:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 18:49 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 18:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 18:44 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 18:34 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 18:34 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 18:33 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 18:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 18:32 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 18:32 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 18:29 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 18:29 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 18:27 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
- 18:27 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
- 18:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 18:23 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host contint1003.wikimedia.org with OS trixie
- 18:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 18:16 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 18:16 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 18:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet
- 18:05 herron@deploy2002: Sync cancelled.
- 18:04 herron@deploy2002: herron: Backport for Revert "udp2log: switch to new hosts" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:02 herron@deploy2002: Started scap sync-world: Backport for Revert "udp2log: switch to new hosts"
- 18:01 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host dse-k8s-ctrl1001.eqiad.wmnet
- 17:54 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:47 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:42 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:42 herron@deploy2002: Sync cancelled.
- 17:40 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:39 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:38 mutante: contint1003 - unable to get uptime Caused by: Cumin execution failed (exit_code=2) [101/240] - attempted manual powercycle - Initializing Firmware Interfaces... blank screen T418544
- 17:34 mutante: contint1003.mgmt - racadm serveraction powercycle T418544 - not reacting
- 17:25 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:25 herron@deploy2002: herron: Backport for udp2log: switch to new hosts (T417002) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:23 herron@deploy2002: Started scap sync-world: Backport for udp2log: switch to new hosts (T417002)
- 17:19 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow4003.ulsfo.wmnet
- 17:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow4003.ulsfo.wmnet with OS bookworm
- 17:13 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:08 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
- 17:03 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host contint1003.wikimedia.org with OS trixie
- 17:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
- 17:00 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis kaiwiki in section s5
- 16:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow4003.ulsfo.wmnet with reason: host reimage
- 16:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow4003.ulsfo.wmnet with reason: host reimage
- 16:37 moritzm: installing gnupg security updates
- 16:31 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow4003.ulsfo.wmnet with OS bookworm
- 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow4003.ulsfo.wmnet - jmm@cumin2002"
- 16:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow4003.ulsfo.wmnet - jmm@cumin2002"
- 16:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow4003.ulsfo.wmnet on all recursors
- 16:30 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow4003.ulsfo.wmnet on all recursors
- 16:30 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:30 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow4003.ulsfo.wmnet - jmm@cumin2002"
- 16:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow4003.ulsfo.wmnet - jmm@cumin2002"
- 16:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 16:26 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow4003.ulsfo.wmnet
- 16:26 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 16:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus4003.ulsfo.wmnet with OS bookworm
- 15:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus4003.ulsfo.wmnet with reason: host reimage
- 15:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus4003.ulsfo.wmnet with reason: host reimage
- 15:44 vgutierrez: vgutierrez@acmechief-test2001:~$ sudo -i systemctl disable reload-acme-chief-backend.timer - T419352
- 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org
- 15:37 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 15:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org
- 15:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus4003.ulsfo.wmnet with OS bookworm
- 15:26 elukey@cumin1003: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-test-eqiad cluster: Change Confluent distribution.
- 15:24 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2004.codfw.wmnet
- 15:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2004.codfw.wmnet
- 15:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
- 15:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet
- 15:08 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
- 15:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet
- 14:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS bookworm
- 14:49 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs2009.codfw.wmnet with OS bullseye
- 14:45 elukey@cumin1003: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-test-eqiad cluster: Change Confluent distribution.
- 14:35 mszwarc@deploy2002: Finished scap sync-world: Backport for Hide 2fa-warning Echo category from preferences (T419111) (duration: 06m 07s)
- 14:35 fceratto@cumin1003: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis kaiwiki in section s5
- 14:34 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis urwikisource in section s5
- 14:31 mszwarc@deploy2002: mszwarc: Continuing with sync
- 14:31 mszwarc@deploy2002: mszwarc: Backport for Hide 2fa-warning Echo category from preferences (T419111) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:30 fceratto@cumin1003: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis urwikisource in section s5
- 14:29 mszwarc@deploy2002: Started scap sync-world: Backport for Hide 2fa-warning Echo category from preferences (T419111)
- 14:25 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis urwikisource in section s5
- 14:22 fceratto@cumin1003: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis urwikisource in section s5
- 14:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
- 14:15 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
- 14:15 phuedx@deploy2002: Finished scap sync-world: Backport for JS SDK: Add getExperimentByPrefix() (T419191), ext.wikimediaEvents: pageVisit -> loggedOutReaderRetention (T419191) (duration: 09m 39s)
- 14:11 phuedx@deploy2002: phuedx: Continuing with sync
- 14:07 phuedx@deploy2002: phuedx: Backport for JS SDK: Add getExperimentByPrefix() (T419191), ext.wikimediaEvents: pageVisit -> loggedOutReaderRetention (T419191) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:05 phuedx@deploy2002: Started scap sync-world: Backport for JS SDK: Add getExperimentByPrefix() (T419191), ext.wikimediaEvents: pageVisit -> loggedOutReaderRetention (T419191)
- 14:03 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus4003.ulsfo.wmnet with OS bookworm
- 13:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2003.codfw.wmnet
- 13:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS bullseye
- 13:50 phuedx@deploy2002: Finished scap sync-world: Backport for Disable MetricsPlatform extension (T416865) (duration: 08m 02s)
- 13:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin2003.codfw.wmnet
- 13:46 phuedx@deploy2002: phuedx, sfaci: Continuing with sync
- 13:44 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 13:44 phuedx@deploy2002: phuedx, sfaci: Backport for Disable MetricsPlatform extension (T416865) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:42 phuedx@deploy2002: Started scap sync-world: Backport for Disable MetricsPlatform extension (T416865)
- 13:39 phuedx@deploy2002: Finished scap sync-world: Backport for Confirmemail: Log delay between email sent and confirmation (T415902), Enable confirmemail logstash channel (T415902) (duration: 11m 16s)
- 13:35 phuedx@deploy2002: mmartorana, phuedx: Continuing with sync
- 13:30 phuedx@deploy2002: mmartorana, phuedx: Backport for Confirmemail: Log delay between email sent and confirmation (T415902), Enable confirmemail logstash channel (T415902) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:28 phuedx@deploy2002: Started scap sync-world: Backport for Confirmemail: Log delay between email sent and confirmation (T415902), Enable confirmemail logstash channel (T415902)
- 13:10 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus4003.ulsfo.wmnet with OS bookworm
- 13:04 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus4003.ulsfo.wmnet with OS bookworm
- 12:55 moritzm: installing Kerberos security updates
- 12:29 moritzm: installing python3.9 security updates
- 12:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus4003.ulsfo.wmnet with OS bookworm
- 12:00 reedy@deploy2002: Finished scap sync-world: Backport for Revert "CommonSettings: Temporarily set $wgOATHUserHandlesTable = true" (T416544), CommonSettings: Remove orphaned $wgWebAuthnNewCredsDisabled (duration: 06m 13s)
- 11:56 reedy@deploy2002: reedy: Continuing with sync
- 11:56 reedy@deploy2002: reedy: Backport for Revert "CommonSettings: Temporarily set $wgOATHUserHandlesTable = true" (T416544), CommonSettings: Remove orphaned $wgWebAuthnNewCredsDisabled synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:54 reedy@deploy2002: Started scap sync-world: Backport for Revert "CommonSettings: Temporarily set $wgOATHUserHandlesTable = true" (T416544), CommonSettings: Remove orphaned $wgWebAuthnNewCredsDisabled
- 11:44 phuedx@deploy2002: Finished scap sync-world: Backport for Hooks: Really only add global logging context for pageviews (duration: 12m 02s)
- 11:38 phuedx@deploy2002: phuedx: Continuing with sync
- 11:34 phuedx@deploy2002: phuedx: Backport for Hooks: Really only add global logging context for pageviews synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:32 phuedx@deploy2002: Started scap sync-world: Backport for Hooks: Really only add global logging context for pageviews
- 11:29 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 11:29 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 11:03 brouberol@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 11:03 brouberol@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:57 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus4003.ulsfo.wmnet with OS bookworm
- 10:56 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus4003.ulsfo.wmnet with OS bookworm
- 10:50 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:49 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:45 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus4003.ulsfo.wmnet
- 10:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus4003.ulsfo.wmnet with OS bookworm
- 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host prometheus4003.ulsfo.wmnet with OS bookworm
- 10:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus4003.ulsfo.wmnet - jmm@cumin2002"
- 10:44 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus4003.ulsfo.wmnet - jmm@cumin2002"
- 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus4003.ulsfo.wmnet on all recursors
- 10:43 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache prometheus4003.ulsfo.wmnet on all recursors
- 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus4003.ulsfo.wmnet - jmm@cumin2002"
- 10:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus4003.ulsfo.wmnet - jmm@cumin2002"
- 10:40 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:39 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/kafka-mirrormaker: apply
- 10:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:39 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host prometheus4003.ulsfo.wmnet
- 10:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:17 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 10:12 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:51 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 09:46 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 09:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:40 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus4003.ulsfo.wmnet
- 09:40 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 09:35 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 09:35 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host prometheus4003.ulsfo.wmnet
- 09:31 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host frdb1008
- 09:31 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host frdb1008
- 09:29 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 09:05 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 08:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 08:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 08:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
- 08:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
- 08:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply
- 08:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply
- 08:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 08:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 08:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 08:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 08:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 08:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 08:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 1
- 08:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 08:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 1
- 08:25 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 08:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
- 08:25 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 08:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
- 08:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 08:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 08:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 08:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 08:21 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 08:16 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 1
- 08:16 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo02 and group 1
- 08:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
- 08:07 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
- 07:37 mszwarc@deploy2002: Finished scap sync-world: Backport for Add a script to send mandatory 2FA Echo notification (T419111), Set $wgOATH2FARequiredGroupRemovalPages for interface-admins (T417880) (duration: 34m 41s)
- 07:23 mszwarc@deploy2002: mszwarc: Continuing with sync
- 07:22 mszwarc@deploy2002: mszwarc: Backport for Add a script to send mandatory 2FA Echo notification (T419111), Set $wgOATH2FARequiredGroupRemovalPages for interface-admins (T417880) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:02 mszwarc@deploy2002: Started scap sync-world: Backport for Add a script to send mandatory 2FA Echo notification (T419111), Set $wgOATH2FARequiredGroupRemovalPages for interface-admins (T417880)
- 02:08 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 07m 58s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-03-08
- 20:28 vgutierrez@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on acmechief-test2001.codfw.wmnet with reason: GTS issues
- 02:01 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 00m 59s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-03-07
- 02:09 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 23s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:20 krinkle@deploy2002: Finished scap sync-world: Backport for CSP: restore toolforge/wmcs entry in false positive list (duration: 10m 46s)
- 01:16 krinkle@deploy2002: krinkle: Continuing with sync
- 01:11 krinkle@deploy2002: krinkle: Backport for CSP: restore toolforge/wmcs entry in false positive list synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:09 krinkle@deploy2002: Started scap sync-world: Backport for CSP: restore toolforge/wmcs entry in false positive list
- 00:22 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2043.*
- 00:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2043.codfw.wmnet
- 00:05 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2043.codfw.wmnet
2026-03-06
- 23:29 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2009.codfw.wmnet with OS bullseye
- 23:13 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
- 23:07 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2009.codfw.wmnet with reason: host reimage
- 22:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2009
- 22:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2009
- 22:46 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2009
- 22:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2009.codfw.wmnet 141.0.192.10.in-addr.arpa 1.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 22:46 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2009.codfw.wmnet 141.0.192.10.in-addr.arpa 1.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 22:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2009 - ryankemper@cumin2002"
- 22:45 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2009 - ryankemper@cumin2002"
- 22:41 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 22:40 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2009
- 22:39 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2009.codfw.wmnet with OS bullseye
- 19:48 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:46 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:36 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host wdqs2009.codfw.wmnet
- 19:23 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
- 19:17 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on wdqs2009.codfw.wmnet with reason: NFS might be hung, about to reboot
- 18:56 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2043.codfw.wmnet with reason: troubleshooting for network drops
- 18:44 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2043.*
- 18:29 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-backup-datanode1033.eqiad.wmnet
- 18:29 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:29 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-backup-datanode1033.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
- 18:28 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-backup-datanode1033.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
- 17:59 ebernhardson@deploy2002: Finished scap sync-world: Backport for cirrus: Use https for semanticsearch-test cluster (duration: 11m 20s)
- 17:53 ebernhardson@deploy2002: ebernhardson: Continuing with sync
- 17:52 ebernhardson@deploy2002: ebernhardson: Backport for cirrus: Use https for semanticsearch-test cluster synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:51 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 17:51 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 17:47 ebernhardson@deploy2002: Started scap sync-world: Backport for cirrus: Use https for semanticsearch-test cluster
- 17:42 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 17:42 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 17:40 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 17:40 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 17:11 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 17:11 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 17:10 btullis@cumin1003: START - Cookbook sre.dns.netbox
- 17:05 hashar@deploy2002: Finished deploy [gerrit/gerrit@b8183ba]: wm-checks-api: add tooltip to the CheckRun Run action (duration: 00m 13s)
- 17:05 hashar@deploy2002: Started deploy [gerrit/gerrit@b8183ba]: wm-checks-api: add tooltip to the CheckRun Run action
- 17:04 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-backup-datanode1033.eqiad.wmnet
- 16:48 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 16:48 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 16:23 btullis@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 16:23 btullis@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 15:57 brouberol@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 15:57 brouberol@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 15:56 brouberol@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:56 brouberol@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:52 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2354-2356].codfw.wmnet
- 15:52 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2354-2356].codfw.wmnet
- 15:51 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2356.codfw.wmnet with OS trixie
- 15:46 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2355.codfw.wmnet with OS trixie
- 15:42 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2354.codfw.wmnet with OS trixie
- 15:31 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2356.codfw.wmnet with reason: host reimage
- 15:31 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
- 15:30 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
- 15:28 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:28 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
- 15:28 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
- 15:26 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
- 15:26 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
- 15:26 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2355.codfw.wmnet with reason: host reimage
- 15:24 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 15:23 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 15:23 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2354.codfw.wmnet with reason: host reimage
- 15:19 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 15:19 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 15:17 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 15:17 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 15:17 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2356.codfw.wmnet with reason: host reimage
- 15:16 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2355.codfw.wmnet with reason: host reimage
- 15:16 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2354.codfw.wmnet with reason: host reimage
- 15:15 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 15:10 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
- 15:09 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
- 15:08 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
- 15:08 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
- 15:06 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
- 15:05 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
- 15:05 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
- 15:05 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
- 15:03 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2356.codfw.wmnet with OS trixie
- 15:03 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2355.codfw.wmnet with OS trixie
- 15:03 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2354.codfw.wmnet with OS trixie
- 15:02 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
- 15:02 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2348-2353].codfw.wmnet
- 15:02 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
- 15:02 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2348-2353].codfw.wmnet
- 14:59 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2353.codfw.wmnet with OS trixie
- 14:57 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2349.codfw.wmnet with OS trixie
- 14:57 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
- 14:56 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
- 14:53 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
- 14:52 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
- 14:52 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2351.codfw.wmnet with OS trixie
- 14:49 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 14:49 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 14:48 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2352.codfw.wmnet with OS trixie
- 14:48 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 14:48 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 14:48 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 14:47 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 14:45 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 14:45 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2350.codfw.wmnet with OS trixie
- 14:44 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 14:43 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
- 14:43 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
- 14:43 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2348.codfw.wmnet with OS trixie
- 14:41 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2353.codfw.wmnet with reason: host reimage
- 14:37 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2349.codfw.wmnet with reason: host reimage
- 14:33 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2351.codfw.wmnet with reason: host reimage
- 14:30 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2352.codfw.wmnet with reason: host reimage
- 14:29 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 14:28 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 14:26 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2350.codfw.wmnet with reason: host reimage
- 14:23 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2348.codfw.wmnet with reason: host reimage
- 14:21 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2351.codfw.wmnet with reason: host reimage
- 14:20 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2352.codfw.wmnet with reason: host reimage
- 14:20 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2353.codfw.wmnet with reason: host reimage
- 14:20 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2350.codfw.wmnet with reason: host reimage
- 14:19 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2349.codfw.wmnet with reason: host reimage
- 14:19 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2348.codfw.wmnet with reason: host reimage
- 14:07 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2353.codfw.wmnet with OS trixie
- 14:07 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2352.codfw.wmnet with OS trixie
- 14:06 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2351.codfw.wmnet with OS trixie
- 14:06 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2350.codfw.wmnet with OS trixie
- 14:06 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2349.codfw.wmnet with OS trixie
- 14:06 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2348.codfw.wmnet with OS trixie
- 14:03 blake@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2349.codfw.wmnet with OS trixie
- 14:03 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2349.codfw.wmnet with OS trixie
- 14:03 blake@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2348.codfw.wmnet with OS trixie
- 14:03 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2348.codfw.wmnet with OS trixie
- 14:02 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2342-2347].codfw.wmnet
- 14:02 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2342-2347].codfw.wmnet
- 14:01 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2347.codfw.wmnet with OS trixie
- 13:57 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2346.codfw.wmnet with OS trixie
- 13:55 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2343.codfw.wmnet with OS trixie
- 13:50 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2345.codfw.wmnet with OS trixie
- 13:48 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2344.codfw.wmnet with OS trixie
- 13:45 dreamyjazz@deploy2002: mwscript-k8s job started: foreachwikiindblist checkuser-suggested-investigations CheckUser:queueAutoCloseSICases.php # T418591
- 13:43 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2342.codfw.wmnet with OS trixie
- 13:42 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2347.codfw.wmnet with reason: host reimage
- 13:38 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2346.codfw.wmnet with reason: host reimage
- 13:35 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2343.codfw.wmnet with reason: host reimage
- 13:31 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2345.codfw.wmnet with reason: host reimage
- 13:28 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2344.codfw.wmnet with reason: host reimage
- 13:24 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2342.codfw.wmnet with reason: host reimage
- 13:21 Dreamy_Jazz: Running foreachwikiindblist checkuser-suggested-investigations.dblist ~/PopulateSiuInfo.php --batch-size=1000 for T411118
- 13:21 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2347.codfw.wmnet with reason: host reimage
- 13:20 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2346.codfw.wmnet with reason: host reimage
- 13:20 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2345.codfw.wmnet with reason: host reimage
- 13:20 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2344.codfw.wmnet with reason: host reimage
- 13:20 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2343.codfw.wmnet with reason: host reimage
- 13:19 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2342.codfw.wmnet with reason: host reimage
- 13:07 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2347.codfw.wmnet with OS trixie
- 13:07 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2346.codfw.wmnet with OS trixie
- 13:07 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2345.codfw.wmnet with OS trixie
- 13:07 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2344.codfw.wmnet with OS trixie
- 13:06 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2343.codfw.wmnet with OS trixie
- 13:06 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2342.codfw.wmnet with OS trixie
- 13:05 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2336-2341].codfw.wmnet
- 13:05 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2336-2341].codfw.wmnet
- 13:01 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2341.codfw.wmnet with OS trixie
- 12:54 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2340.codfw.wmnet with OS trixie
- 12:49 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2337.codfw.wmnet with OS trixie
- 12:45 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2338.codfw.wmnet with OS trixie
- 12:43 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2336.codfw.wmnet with OS trixie
- 12:40 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2341.codfw.wmnet with reason: host reimage
- 12:35 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2339.codfw.wmnet with OS trixie
- 12:34 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2340.codfw.wmnet with reason: host reimage
- 12:31 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2337.codfw.wmnet with reason: host reimage
- 12:26 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2338.codfw.wmnet with reason: host reimage
- 12:22 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2336.codfw.wmnet with reason: host reimage
- 12:18 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2339.codfw.wmnet with reason: host reimage
- 12:13 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2340.codfw.wmnet with reason: host reimage
- 12:13 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2341.codfw.wmnet with reason: host reimage
- 12:12 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2337.codfw.wmnet with reason: host reimage
- 12:12 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2338.codfw.wmnet with reason: host reimage
- 12:12 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2336.codfw.wmnet with reason: host reimage
- 12:12 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2339.codfw.wmnet with reason: host reimage
- 12:00 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2341.codfw.wmnet with OS trixie
- 12:00 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2340.codfw.wmnet with OS trixie
- 11:59 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2339.codfw.wmnet with OS trixie
- 11:59 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2338.codfw.wmnet with OS trixie
- 11:59 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2337.codfw.wmnet with OS trixie
- 11:59 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2336.codfw.wmnet with OS trixie
- 11:56 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2333-2335].codfw.wmnet
- 11:55 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2333-2335].codfw.wmnet
- 11:55 elukey@cumin1003: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-test-eqiad cluster: Change Confluent distribution.
- 11:54 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1207.eqiad.wmnet
- 11:54 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2335.codfw.wmnet with OS trixie
- 11:53 moritzm: uploaded icu 72.1-3+deb12u1~wmf11u1 to component/php83-icu72 T419058 (backport of ICU 72 from Bookworm to Bullseye, built to be co-installable with the native ICU from Bullseye)
- 11:50 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2334.codfw.wmnet with OS trixie
- 11:49 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1207.eqiad.wmnet
- 11:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1205.eqiad.wmnet
- 11:45 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2333.codfw.wmnet with OS trixie
- 11:39 elukey@cumin1003: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-test-eqiad cluster: Change Confluent distribution.
- 11:39 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1205.eqiad.wmnet
- 11:34 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2335.codfw.wmnet with reason: host reimage
- 11:30 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2334.codfw.wmnet with reason: host reimage
- 11:27 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2333.codfw.wmnet with reason: host reimage
- 11:23 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2335.codfw.wmnet with reason: host reimage
- 11:22 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2334.codfw.wmnet with reason: host reimage
- 11:21 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2333.codfw.wmnet with reason: host reimage
- 11:09 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2334.codfw.wmnet with OS trixie
- 11:09 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2335.codfw.wmnet with OS trixie
- 11:08 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2333.codfw.wmnet with OS trixie
- 11:06 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2332.codfw.wmnet
- 11:05 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2332.codfw.wmnet
- 11:02 blake@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2332.codfw.wmnet with OS trixie
- 10:43 blake@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2332.codfw.wmnet with reason: host reimage
- 10:36 blake@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2332.codfw.wmnet with reason: host reimage
- 10:23 blake@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2332.codfw.wmnet with OS trixie
- 10:23 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1199.eqiad.wmnet
- 10:21 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1194.eqiad.wmnet
- 10:16 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2332-2356].codfw.wmnet
- 10:13 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1194.eqiad.wmnet
- 10:09 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 10:09 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1199.eqiad.wmnet
- 10:09 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 10:03 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2332-2356].codfw.wmnet
- 09:59 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 09:59 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 09:39 Emperor: repool ms-fe1013 after PXE work T401966
- 09:23 derick@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=pmswiki --logwiki=metawiki Wikilimes Limes.pink # T419184
- 09:10 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:09 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:08 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ms-fe1013.eqiad.wmnet
- 09:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1013.eqiad.wmnet
- 08:57 elukey@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-fe1013.eqiad.wmnet
- 08:56 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-fe1013.eqiad.wmnet
- 08:54 elukey@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ms-fe1013.eqiad.wmnet
- 08:42 elukey@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ms-fe1013.eqiad.wmnet
- 08:25 moritzm: uploaded openjdk-8 8u482-ga-1~deb12u1 to component/jdk8 of bookworm-wikimedia
- 08:11 moritzm: imported prometheus-ganeti-exporter 0.3+deb12u2 for bookworm-wikimedia T419166
- 06:23 ryankemper@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 06:23 ryankemper@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 06:23 ryankemper@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 06:23 ryankemper@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 06:22 ryankemper@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 06:22 ryankemper@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 02:59 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 02:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change frack mgmt vlan interface - pt1979@cumin2002"
- 02:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change frack mgmt vlan interface - pt1979@cumin2002"
- 02:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 02:21 zabe: zabe@deploy2002:/srv/mediawiki-staging$ foreachwiki extensions/TimedMediaHandler/maintenance/migrateTranscodeStates.php --force # T415064
- 02:16 zabe@deploy2002: Finished scap sync-world: Backport for Update interwiki cache (duration: 06m 38s)
- 02:12 zabe@deploy2002: mwscript-k8s job started: foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https # T415978, T414241
- 02:12 zabe@deploy2002: zabe: Continuing with sync
- 02:11 zabe@deploy2002: zabe: Backport for Update interwiki cache synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 02:09 zabe@deploy2002: Started scap sync-world: Backport for Update interwiki cache
- 02:09 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 23s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:59 zabe@deploy2002: Finished scap sync-world: Backport for Set urwikisource to rtl (T415960) (duration: 06m 39s)
- 01:55 zabe@deploy2002: zabe: Continuing with sync
- 01:54 zabe@deploy2002: zabe: Backport for Set urwikisource to rtl (T415960) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:53 zabe@deploy2002: Started scap sync-world: Backport for Set urwikisource to rtl (T415960)
- 01:45 zabe@deploy2002: Sync cancelled.
- 01:43 zabe@deploy2002: zabe: Backport for Activate urwikisource (T415960) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:42 zabe@deploy2002: Started scap sync-world: Backport for Activate urwikisource (T415960)
- 01:38 zabe@deploy2002: Finished scap sync-world: Backport for Prepare urwikisource (T415960) (duration: 06m 18s)
- 01:34 zabe@deploy2002: zabe: Continuing with sync
- 01:34 zabe@deploy2002: zabe: Backport for Prepare urwikisource (T415960) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:32 zabe@deploy2002: Started scap sync-world: Backport for Prepare urwikisource (T415960)
- 01:29 zabe@deploy2002: Finished scap sync-world: Backport for Activate kaiwiki (T414234) (duration: 06m 57s)
- 01:25 zabe@deploy2002: zabe: Continuing with sync
- 01:24 zabe@deploy2002: zabe: Backport for Activate kaiwiki (T414234) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:22 zabe@deploy2002: Started scap sync-world: Backport for Activate kaiwiki (T414234)
- 01:17 zabe@deploy2002: Finished scap sync-world: Backport for Prepare kaiwiki (T414234) (duration: 07m 25s)
- 01:13 zabe@deploy2002: zabe: Continuing with sync
- 01:11 zabe@deploy2002: zabe: Backport for Prepare kaiwiki (T414234) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:09 zabe@deploy2002: Started scap sync-world: Backport for Prepare kaiwiki (T414234)
- 00:33 zabe@deploy2002: Finished scap sync-world: Backport for Stop writing to il_to on all wikis except commons (T415787) (duration: 06m 22s)
- 00:29 zabe@deploy2002: zabe: Continuing with sync
- 00:28 zabe@deploy2002: zabe: Backport for Stop writing to il_to on all wikis except commons (T415787) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:27 zabe@deploy2002: Started scap sync-world: Backport for Stop writing to il_to on all wikis except commons (T415787)
- 00:05 catrope@deploy2002: Finished scap sync-world: Backport for Re-enable AllowUserJs (T419137) (duration: 08m 08s)
- 00:01 catrope@deploy2002: catrope, kharlan: Continuing with sync
2026-03-05
- 23:58 catrope@deploy2002: catrope, kharlan: Backport for Re-enable AllowUserJs (T419137) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:56 catrope@deploy2002: Started scap sync-world: Backport for Re-enable AllowUserJs (T419137)
- 23:52 catrope@deploy2002: Finished scap sync-world: Backport for CSP: Update false positives list (duration: 06m 34s)
- 23:52 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host contint2003.wikimedia.org with OS trixie
- 23:47 catrope@deploy2002: catrope: Continuing with sync
- 23:47 catrope@deploy2002: catrope: Backport for CSP: Update false positives list synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:45 catrope@deploy2002: Started scap sync-world: Backport for CSP: Update false positives list
- 23:33 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on contint2003.wikimedia.org with reason: host reimage
- 23:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on contint2003.wikimedia.org with reason: host reimage
- 23:15 zabe@deploy2002: Finished scap sync-world: Backport for Using Hadoop for MostTranscludedPages on commonswiki (T416927) (duration: 06m 27s)
- 23:11 zabe@deploy2002: zabe: Continuing with sync
- 23:10 zabe@deploy2002: zabe: Backport for Using Hadoop for MostTranscludedPages on commonswiki (T416927) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:09 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host contint2003.wikimedia.org with OS trixie
- 23:08 zabe@deploy2002: Started scap sync-world: Backport for Using Hadoop for MostTranscludedPages on commonswiki (T416927)
- 22:45 maryum: Deployed security fix for T418254
- 22:35 zabe@deploy2002: Finished scap sync-world: Backport for SpecialWantedFiles: Use lt_title instead of lt_to (T299953) (duration: 06m 12s)
- 22:31 zabe@deploy2002: zabe: Continuing with sync
- 22:30 zabe@deploy2002: zabe: Backport for SpecialWantedFiles: Use lt_title instead of lt_to (T299953) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:28 zabe@deploy2002: Started scap sync-world: Backport for SpecialWantedFiles: Use lt_title instead of lt_to (T299953)
- 21:43 ebernhardson@deploy2002: Finished scap sync-world: Backport for cirrus: Align semanticsearch cluster group name with routing (T413969) (duration: 07m 20s)
- 21:39 ebernhardson@deploy2002: ebernhardson: Continuing with sync
- 21:38 ebernhardson@deploy2002: ebernhardson: Backport for cirrus: Align semanticsearch cluster group name with routing (T413969) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:36 ebernhardson@deploy2002: Started scap sync-world: Backport for cirrus: Align semanticsearch cluster group name with routing (T413969)
- 21:04 jhathaway@dns1004: END - running authdns-update
- 21:02 jhathaway@dns1004: START - running authdns-update
- 20:53 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:52 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new service IPs for sophroid - jasmine@cumin2002"
- 20:52 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new service IPs for sophroid - jasmine@cumin2002"
- 20:47 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 20:28 cdanis: apt built and imported jwt-authorizer 1.3.0-1
- 20:16 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.18 refs T413809
- 20:04 krinkle@deploy2002: Finished scap sync-world: Backport for Allow toolforge APIs in enforced CSP mode (T135963 T419137 T220475) (duration: 07m 37s)
- 20:00 krinkle@deploy2002: krinkle: Continuing with sync
- 19:58 krinkle@deploy2002: krinkle: Backport for Allow toolforge APIs in enforced CSP mode (T135963 T419137 T220475) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:56 krinkle@deploy2002: Started scap sync-world: Backport for Allow toolforge APIs in enforced CSP mode (T135963 T419137 T220475)
- 19:21 sbassett@deploy2002: Finished scap sync-world: Backport for Re-enable Site JS (T419137 T419138) (duration: 06m 57s)
- 19:17 sbassett@deploy2002: sbassett: Continuing with sync
- 19:16 sbassett@deploy2002: sbassett: Backport for Re-enable Site JS (T419137 T419138) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:15 sbassett@deploy2002: Started scap sync-world: Backport for Re-enable Site JS (T419137 T419138)
- 19:04 dr0ptp4kt: Deploying change 1239200 for refinery ( T416481 ) using scap, then deployed onto hdfs
- 19:03 dr0ptp4kt: Deployed refinery change 1240253 ( T414478 ), 1240253 (no-op) for refinery ( T414478 ) using scap, then deployed onto hdfs
- 18:58 dr0ptp4kt@deploy2002: Finished deploy [analytics/refinery@dd641b1] (thin): Regular analytics weekly train THIN [analytics/refinery@dd641b15] (duration: 02m 02s)
- 18:56 dr0ptp4kt@deploy2002: Started deploy [analytics/refinery@dd641b1] (thin): Regular analytics weekly train THIN [analytics/refinery@dd641b15]
- 18:55 dr0ptp4kt@deploy2002: Finished deploy [analytics/refinery@dd641b1]: Regular analytics weekly train [analytics/refinery@dd641b15] (duration: 04m 18s)
- 18:50 dr0ptp4kt@deploy2002: Started deploy [analytics/refinery@dd641b1]: Regular analytics weekly train [analytics/refinery@dd641b15]
- 18:49 dr0ptp4kt@deploy2002: Finished deploy [analytics/refinery@dd641b1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@dd641b15] (duration: 01m 57s)
- 18:47 dr0ptp4kt: Deploying change 1239200 for refinery ( T416481 )
- 18:47 dr0ptp4kt@deploy2002: Started deploy [analytics/refinery@dd641b1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@dd641b15]
- 18:31 eevans@dns1004: END - running authdns-update
- 18:30 eevans@dns1004: START - running authdns-update
- 18:30 sukhe: sudo cumin -b51 "A:cp" "run-puppet-agent --enable 'rolling out 1248544'"
- 18:16 sukhe: sudo cumin "A:cp" "disable-puppet 'rolling out 1248544'"
- 18:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:06 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change frack mgmt vlan interface - pt1979@cumin2002"
- 18:06 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change frack mgmt vlan interface - pt1979@cumin2002"
- 18:02 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 17:31 mszwarc@deploy2002: Finished scap sync-world: Backport for Enable wgUseSiteJs on donatewiki (T419138) (duration: 09m 57s)
- 17:27 mszwarc@deploy2002: mszwarc, krinkle: Continuing with sync
- 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host contint2003.wikimedia.org with OS bookworm
- 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:23 mszwarc@deploy2002: mszwarc, krinkle: Backport for Enable wgUseSiteJs on donatewiki (T419138) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:21 mszwarc@deploy2002: Started scap sync-world: Backport for Enable wgUseSiteJs on donatewiki (T419138)
- 17:16 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1178.eqiad.wmnet
- 17:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:12 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1162.eqiad.wmnet
- 17:12 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1162.eqiad.wmnet
- 17:10 cgoubert@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker1162.eqiad.wmnet
- 17:10 cgoubert@cumin1003: START - Cookbook sre.hosts.remove-downtime for wikikube-worker1162.eqiad.wmnet
- 17:09 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1178.eqiad.wmnet
- 17:05 taavi@cumin1003: dbctl commit (dc=all): 'enable writes', diff saved to https://phabricator.wikimedia.org/P89812 and previous config saved to /var/cache/conftool/dbconfig/20260305-170556-taavi.json
- 16:03 oblivian@cumin1003: dbctl commit (dc=all): 'read only s6', diff saved to https://phabricator.wikimedia.org/P89810 and previous config saved to /var/cache/conftool/dbconfig/20260305-160348-oblivian.json
- 15:32 taavi@cumin1003: dbctl commit (dc=all): 'set global ro', diff saved to https://phabricator.wikimedia.org/P89808 and previous config saved to /var/cache/conftool/dbconfig/20260305-153203-taavi.json
- 15:31 mszwarc@deploy2002: mszwarc: Continuing with sync
- 15:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1178.eqiad.wmnet
- 15:31 mszwarc@deploy2002: mszwarc: Backport for Disable custom JS for a moment synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:29 mszwarc@deploy2002: Started scap sync-world: Backport for Disable custom JS for a moment
- 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['contint2003']
- 15:25 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['contint2003']
- 15:23 ebernhardson@deploy2002: Finished scap sync-world: Backport for cirrus: Correct semantic builder config (T413969) (duration: 07m 39s)
- 15:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:19 ebernhardson@deploy2002: ebernhardson: Continuing with sync
- 15:18 ebernhardson@deploy2002: ebernhardson: Backport for cirrus: Correct semantic builder config (T413969) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:16 ebernhardson@deploy2002: Started scap sync-world: Backport for cirrus: Correct semantic builder config (T413969)
- 15:11 elukey@cumin1003: END (PASS) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=0) Change Confluent distribution for Kafka A:kafka-test-eqiad cluster: Change Confluent distribution.
- 15:10 ebernhardson@deploy2002: Finished scap sync-world: Backport for cirrus: Add semantic search test cluster (T413969) (duration: 09m 18s)
- 15:06 ebernhardson@deploy2002: ebernhardson: Continuing with sync
- 15:04 sukhe@dns1004: END - running authdns-update
- 15:03 sukhe@dns1004: START - running authdns-update
- 15:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:02 ebernhardson@deploy2002: ebernhardson: Backport for cirrus: Add semantic search test cluster (T413969) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host contint2003
- 15:02 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host contint2003
- 15:00 ebernhardson@deploy2002: Started scap sync-world: Backport for cirrus: Add semantic search test cluster (T413969)
- 14:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1010.eqiad.wmnet with OS bookworm
- 14:53 elukey@cumin1003: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-test-eqiad cluster: Change Confluent distribution.
- 14:50 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 14:38 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1010
- 14:38 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1010
- 14:32 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1010
- 14:32 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1010
- 14:32 sukhe@dns1004: END - running authdns-update
- 14:30 sukhe@dns1004: START - running authdns-update
- 14:28 elukey@cumin1003: END (FAIL) - Cookbook sre.kafka.change-confluent-distro-version (exit_code=99) Change Confluent distribution for Kafka A:kafka-test-eqiad cluster: Change Confluent distribution.
- 14:28 sukhe@dns1004: START - running authdns-update
- 14:27 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1231.eqiad.wmnet
- 14:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1230.eqiad.wmnet
- 14:24 bking@dns1004: START - running authdns-update
- 14:15 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1230.eqiad.wmnet
- 14:15 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1229.eqiad.wmnet
- 14:11 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1178.eqiad.wmnet
- 14:09 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 14:09 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 14:09 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 14:09 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 14:09 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 14:09 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 14:09 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 14:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 14:05 moritzm: imported nodejs 24.14.0-1nodesource1 to thirdparty/node24 T418440
- 14:03 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1229.eqiad.wmnet
- 14:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1228.eqiad.wmnet
- 14:01 moritzm: initialised ganeti02/ulsfo cluster T418993
- 13:57 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:56 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:56 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:55 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:53 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:52 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1228.eqiad.wmnet
- 13:52 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1227.eqiad.wmnet
- 13:52 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:51 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:50 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:47 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:46 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:43 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:42 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:42 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1199.eqiad.wmnet
- 13:40 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1227.eqiad.wmnet
- 13:40 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1226.eqiad.wmnet
- 13:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 13:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 13:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 13:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 13:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 13:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 13:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 13:35 moritzm: installing glib2.0 security updates
- 13:35 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1199.eqiad.wmnet
- 13:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 13:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 13:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 13:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 13:26 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1226.eqiad.wmnet
- 13:26 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1225.eqiad.wmnet
- 13:26 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 13:15 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1225.eqiad.wmnet
- 13:15 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1224.eqiad.wmnet
- 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm
- 13:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:07 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new VIP for routed ganeti in ulsfo - jmm@cumin2002"
- 13:06 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new VIP for routed ganeti in ulsfo - jmm@cumin2002"
- 13:06 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:05 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:01 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1224.eqiad.wmnet
- 13:01 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1223.eqiad.wmnet
- 13:00 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 12:59 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 12:58 cgoubert@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on wikikube-worker1162.eqiad.wmnet with reason: dcops intervention
- 12:57 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1162.eqiad.wmnet
- 12:56 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1162.eqiad.wmnet
- 12:55 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage
- 12:49 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1223.eqiad.wmnet
- 12:49 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1222.eqiad.wmnet
- 12:46 elukey@cumin1003: START - Cookbook sre.kafka.change-confluent-distro-version Change Confluent distribution for Kafka A:kafka-test-eqiad cluster: Change Confluent distribution.
- 12:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage
- 12:37 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1222.eqiad.wmnet
- 12:37 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1221.eqiad.wmnet
- 12:23 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1221.eqiad.wmnet
- 12:23 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1220.eqiad.wmnet
- 12:23 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm
- 12:11 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1220.eqiad.wmnet
- 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
- 11:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
- 11:33 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1236.eqiad.wmnet
- 11:29 moritzm: remove ganeti4006 from ganeti/ulsfo cluster T418993
- 11:25 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1236.eqiad.wmnet
- 11:25 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1235.eqiad.wmnet
- 11:16 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1235.eqiad.wmnet
- 11:16 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1234.eqiad.wmnet
- 11:09 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1234.eqiad.wmnet
- 11:09 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1233.eqiad.wmnet
- 11:02 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1233.eqiad.wmnet
- 11:02 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1232.eqiad.wmnet
- 11:02 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 11:00 elukey@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
- 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4005.ulsfo.wmnet with OS bookworm
- 10:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:55 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1232.eqiad.wmnet
- 10:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1231.eqiad.wmnet
- 10:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1231.eqiad.wmnet
- 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1230.eqiad.wmnet
- 10:41 elukey@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
- 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4005.ulsfo.wmnet with reason: host reimage
- 10:37 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1230.eqiad.wmnet
- 10:37 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1229.eqiad.wmnet
- 10:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4005.ulsfo.wmnet with reason: host reimage
- 10:30 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1229.eqiad.wmnet
- 10:30 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1228.eqiad.wmnet
- 10:24 moritzm: installing Java 8 security updates
- 10:22 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1228.eqiad.wmnet
- 10:22 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1227.eqiad.wmnet
- 10:15 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1227.eqiad.wmnet
- 10:14 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1226.eqiad.wmnet
- 10:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
- 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4005.ulsfo.wmnet with OS bookworm
- 10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti4005.ulsfo.wmnet
- 10:08 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti4005.ulsfo.wmnet
- 10:08 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:08 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add gw-virtual.ulsfo.wmnet - ayounsi@cumin1003"
- 10:07 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1226.eqiad.wmnet
- 10:07 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1225.eqiad.wmnet
- 09:59 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1225.eqiad.wmnet
- 09:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1224.eqiad.wmnet
- 09:52 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1224.eqiad.wmnet
- 09:51 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1223.eqiad.wmnet
- 09:44 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1223.eqiad.wmnet
- 09:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1222.eqiad.wmnet
- 09:43 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add gw-virtual.ulsfo.wmnet - ayounsi@cumin1003"
- 09:36 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1222.eqiad.wmnet
- 09:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1221.eqiad.wmnet
- 09:32 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:32 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:28 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1221.eqiad.wmnet
- 09:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1220.eqiad.wmnet
- 09:20 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1220.eqiad.wmnet
- 09:02 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 08:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
- 08:38 mszwarc@deploy2002: Finished scap sync-world: Backport for Drop 'centralnoticeadmin' from $wgOATHRequiredForGroups (T418580) (duration: 07m 07s)
- 08:35 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 08:35 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/airflow-main: apply
- 08:34 mszwarc@deploy2002: mszwarc: Continuing with sync
- 08:33 mszwarc@deploy2002: mszwarc: Backport for Drop 'centralnoticeadmin' from $wgOATHRequiredForGroups (T418580) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:30 mszwarc@deploy2002: Started scap sync-world: Backport for Drop 'centralnoticeadmin' from $wgOATHRequiredForGroups (T418580)
- 08:29 gehel@dns1004: END - running authdns-update
- 08:28 gehel@dns1004: START - running authdns-update
- 08:27 moritzm: installing mbedtls security updates
- 08:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 08:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 08:15 hashar@deploy2002: Finished scap sync-world: Backport for Revert "zhwiki: Add 2026 CNY celebration logos" (duration: 09m 19s)
- 08:11 hashar@deploy2002: hashar, stang: Continuing with sync
- 08:08 hashar@deploy2002: hashar, stang: Backport for Revert "zhwiki: Add 2026 CNY celebration logos" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:06 hashar@deploy2002: Started scap sync-world: Backport for Revert "zhwiki: Add 2026 CNY celebration logos"
- 08:02 moritzm: uploaded openjdk-8 8u482-ga-1~deb11u1 to component/jdk8 of bullseye-wikimedia
- 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast4005.wikimedia.org
- 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4005.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 07:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast4005.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 07:48 moritzm: uploaded bird2 2.18-1~wmf13u2 to the main component of trixie-wikimedia T413740
- 07:47 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 07:47 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 07:42 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts bast4005.wikimedia.org
- 06:35 marostegui@cumin1003: dbctl commit (dc=all): 'Remove es1033 T408772', diff saved to https://phabricator.wikimedia.org/P89804 and previous config saved to /var/cache/conftool/dbconfig/20260305-063548-marostegui.json
- 02:10 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 07m 55s)
- 02:02 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 02:01 zabe@deploy2002: Finished scap sync-world: Backport for Stop writing to il_to on medium size wikis (T415787) (duration: 06m 14s)
- 01:58 zabe@deploy2002: zabe: Continuing with sync
- 01:57 zabe@deploy2002: zabe: Backport for Stop writing to il_to on medium size wikis (T415787) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:55 zabe@deploy2002: Started scap sync-world: Backport for Stop writing to il_to on medium size wikis (T415787)
- 01:40 zabe@deploy2002: Finished scap sync-world: Backport for Start reading from new file tables on medium wikis (T416548) (duration: 06m 15s)
- 01:36 zabe@deploy2002: zabe: Continuing with sync
- 01:36 zabe@deploy2002: zabe: Backport for Start reading from new file tables on medium wikis (T416548) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:34 zabe@deploy2002: Started scap sync-world: Backport for Start reading from new file tables on medium wikis (T416548)
- 01:29 zabe@deploy2002: Finished scap sync-world: Backport for ImageListPager: Use correct name field for batch lookups (T418327), Revert^2 "ImageListPager: Properly support file schema migration read new" (duration: 07m 21s)
- 01:25 zabe@deploy2002: zabe: Continuing with sync
- 01:23 zabe@deploy2002: zabe: Backport for ImageListPager: Use correct name field for batch lookups (T418327), Revert^2 "ImageListPager: Properly support file schema migration read new" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:21 zabe@deploy2002: Started scap sync-world: Backport for ImageListPager: Use correct name field for batch lookups (T418327), Revert^2 "ImageListPager: Properly support file schema migration read new"
- 00:55 zabe@deploy2002: Finished scap sync-world: Backport for Stop writing to il_to on small wikis (T415787) (duration: 06m 49s)
- 00:51 zabe@deploy2002: zabe: Continuing with sync
- 00:50 zabe@deploy2002: zabe: Backport for Stop writing to il_to on small wikis (T415787) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:48 zabe@deploy2002: Started scap sync-world: Backport for Stop writing to il_to on small wikis (T415787)
- 00:19 zabe@deploy2002: Finished scap sync-world: Backport for NewFilesPager: Properly support file schema migration read new (T419062), NewFilesPager: Properly support file schema migration read new (T419062) (duration: 08m 52s)
- 00:13 zabe@deploy2002: zabe: Continuing with sync
- 00:12 zabe@deploy2002: zabe: Backport for NewFilesPager: Properly support file schema migration read new (T419062), NewFilesPager: Properly support file schema migration read new (T419062) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:10 zabe@deploy2002: Started scap sync-world: Backport for NewFilesPager: Properly support file schema migration read new (T419062), NewFilesPager: Properly support file schema migration read new (T419062)
2026-03-04
- 22:57 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 22:56 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 22:55 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 22:55 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 22:55 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 22:54 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 22:35 tgr_: UTC late deploys done
- 22:33 tgr@deploy2002: Finished scap sync-world: Backport for Introduce a Semantic Search query route and builder (T413969), Wire up semantic query building (T413969) (duration: 38m 28s)
- 22:16 tgr@deploy2002: tgr, ebernhardson: Continuing with sync
- 22:14 tgr@deploy2002: tgr, ebernhardson: Backport for Introduce a Semantic Search query route and builder (T413969), Wire up semantic query building (T413969) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:54 tgr@deploy2002: Started scap sync-world: Backport for Introduce a Semantic Search query route and builder (T413969), Wire up semantic query building (T413969)
- 21:48 tgr@deploy2002: Finished scap sync-world: Backport for Enable JWT session cookie for bot passwords (all wikis) (attempt #3) (T415007 T418999) (duration: 07m 05s)
- 21:47 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on dse-k8s-worker1028.eqiad.wmnet with reason: broken networking
- 21:44 tgr@deploy2002: tgr: Continuing with sync
- 21:43 tgr@deploy2002: tgr: Backport for Enable JWT session cookie for bot passwords (all wikis) (attempt #3) (T415007 T418999) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:40 tgr@deploy2002: Started scap sync-world: Backport for Enable JWT session cookie for bot passwords (all wikis) (attempt #3) (T415007 T418999)
- 21:36 tgr@deploy2002: Finished scap sync-world: Backport for Add synthetic AAA experiment (T418614), Add synthetic AAA experiment (T418614) (duration: 09m 11s)
- 21:35 bking@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host dse-k8s-worker1028.eqiad.wmnet
- 21:32 tgr@deploy2002: cjming, tgr: Continuing with sync
- 21:30 bking@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host dse-k8s-worker1028.eqiad.wmnet
- 21:29 tgr@deploy2002: cjming, tgr: Backport for Add synthetic AAA experiment (T418614), Add synthetic AAA experiment (T418614) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:27 tgr@deploy2002: Started scap sync-world: Backport for Add synthetic AAA experiment (T418614), Add synthetic AAA experiment (T418614)
- 21:21 tgr@deploy2002: Finished scap sync-world: Backport for logging: set poolcounter channel log level to info (T418612) (duration: 09m 04s)
- 21:17 tgr@deploy2002: tgr, cwhite: Continuing with sync
- 21:14 tgr@deploy2002: tgr, cwhite: Backport for logging: set poolcounter channel log level to info (T418612) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:12 tgr@deploy2002: Started scap sync-world: Backport for logging: set poolcounter channel log level to info (T418612)
- 21:07 tgr@deploy2002: Finished scap sync-world: Backport for Fix $wgJwtSessionCookieIssuer (T415007 T418999) (duration: 09m 55s)
- 21:03 tgr@deploy2002: tgr: Continuing with sync
- 20:59 tgr@deploy2002: tgr: Backport for Fix $wgJwtSessionCookieIssuer (T415007 T418999) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:57 tgr@deploy2002: Started scap sync-world: Backport for Fix $wgJwtSessionCookieIssuer (T415007 T418999)
- 19:56 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.18 refs T413809
- 19:44 jhuneidi@deploy2002: Finished scap sync-world: Backport for CategoryViewer: Fall back to empty string in case of missing nextpage (T418934) (duration: 10m 47s)
- 19:44 cdobbins@cumin2002: conftool action : set/pooled=yes:weight=1; selector: name=cp205[0-8].codfw.wmnet
- 19:43 cdobbins@cumin2002: conftool action : set/pooled=yes:weight=1; selector: name=cp2049.codfw.wmnet
- 19:40 jhuneidi@deploy2002: zabe, jhuneidi: Continuing with sync
- 19:35 jhuneidi@deploy2002: zabe, jhuneidi: Backport for CategoryViewer: Fall back to empty string in case of missing nextpage (T418934) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:34 brett@puppetserver1001: conftool action : set/weight=1; selector: name=cp2043.*
- 19:34 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2043.*
- 19:33 jhuneidi@deploy2002: Started scap sync-world: Backport for CategoryViewer: Fall back to empty string in case of missing nextpage (T418934)
- 19:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2043.codfw.wmnet with OS trixie
- 19:23 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 19:22 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 19:22 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 19:22 rzl@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 19:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage
- 19:06 brett@puppetserver1001: conftool action : set/weight=1; selector: name=cp204[45678].*
- 19:04 brett@puppetserver1001: conftool action : set/weight=100; selector: name=cp204[45678].*
- 19:02 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage
- 18:58 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp204[45678].*
- 18:52 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 18:51 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 18:50 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 18:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 18:49 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:49 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:49 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 18:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 18:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 18:47 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 18:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 18:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS trixie
- 18:46 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 18:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 18:42 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 18:41 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 18:41 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 18:40 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:40 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:40 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 18:40 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 18:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 18:39 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 18:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 18:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 18:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:32 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:32 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:16 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
- 18:16 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
- 18:16 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 18:15 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 18:15 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:14 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 18:14 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 18:13 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 18:13 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 18:13 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 18:12 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 18:12 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
- 17:54 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2047.codfw.wmnet with OS trixie
- 17:33 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2047.codfw.wmnet with reason: host reimage
- 17:27 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2047.codfw.wmnet with reason: host reimage
- 17:23 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 17:23 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 17:18 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 17:18 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 17:15 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 17:13 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 17:09 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2047.codfw.wmnet with OS trixie
- 16:55 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 16:55 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 16:54 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 16:54 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 16:39 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
- 16:39 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.09-unlock-scap (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:39 root@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter switchover from eqiad to codfw - T418133 (duration: 25m 37s)
- 16:39 root@deploy2002: Forcefully removing global lock: Datacenter switchover from eqiad to codfw - T418133
- 16:39 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.09-unlock-scap for datacenter switchover from eqiad to codfw
- 16:39 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
- 16:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
- 16:27 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters for datacenter switchover from eqiad to codfw
- 16:27 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:26 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl for datacenter switchover from eqiad to codfw
- 16:26 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:26 root@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 16:26 root@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 16:26 root@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:26 root@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:26 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from eqiad to codfw
- 16:25 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-mw-jobrunner (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:25 root@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: sync
- 16:25 root@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: sync
- 16:25 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.08-restart-mw-jobrunner for datacenter switchover from eqiad to codfw
- 16:24 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:24 blake@cumin1003: [DRY-RUN] MediaWiki read-only period ends at: 2026-03-04 16:24:40.502004
- 16:24 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.07-set-readwrite for datacenter switchover from eqiad to codfw
- 16:24 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1007.eqiad.wmnet with reason: host reimage
- 16:24 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite for datacenter switchover from eqiad to codfw
- 16:24 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:24 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.04-switch-mediawiki for datacenter switchover from eqiad to codfw
- 16:23 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:23 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly for datacenter switchover from eqiad to codfw
- 16:23 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.02-set-readonly (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:22 blake@cumin1003: [DRY-RUN] MediaWiki read-only period starts at: 2026-03-04 16:22:41.755892
- 16:22 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.02-set-readonly for datacenter switchover from eqiad to codfw
- 16:20 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1007.eqiad.wmnet with reason: host reimage
- 16:20 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:20 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from eqiad to codfw
- 16:19 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:14 moritzm: upgrading cloudservices* to Bird 2.18 T413740
- 16:14 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl for datacenter switchover from eqiad to codfw
- 16:13 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.00-lock-scap (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:13 root@deploy2002: Locking from deployment [ALL REPOSITORIES]: Datacenter switchover from eqiad to codfw - T418133
- 16:13 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.00-lock-scap for datacenter switchover from eqiad to codfw
- 16:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
- 16:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
- 16:10 moritzm: remove ganeti4005 from ganeti/ulsfo cluster T418993
- 16:10 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
- 16:06 blake@cumin1003: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0) for datacenter switchover from eqiad to codfw
- 16:06 blake@cumin1003: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks for datacenter switchover from eqiad to codfw
- 15:59 XioNoX: push pfw policies - T418402
- 15:37 sukhe@dns1004: END - running authdns-update
- 15:36 sukhe@dns1004: START - running authdns-update
- 15:33 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1219.eqiad.wmnet
- 15:32 aqu@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 15:31 aqu@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
- 15:29 cgoubert@cumin1003: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe10[14-24].*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
- 15:24 cgoubert@cumin1003: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe10[14-24].*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
- 15:22 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 15:22 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 15:22 cgoubert@cumin1003: END (ERROR) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=97) rolling restart_daemons on A:swift-fe-eqiad
- 15:21 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1219.eqiad.wmnet
- 15:21 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1218.eqiad.wmnet
- 15:19 cgoubert@cumin1003: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 15:16 ayounsi@cumin1003: conftool action : set/pooled=yes; selector: name=cirrussearch1120.eqiad.wmnet
- 15:16 ayounsi@cumin1003: conftool action : set/pooled=yes; selector: name=cirrussearch1121.eqiad.wmnet
- 15:16 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp1115.eqiad.wmnet [reason: T418772 - BGP maintenance]
- 15:16 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:16 ayounsi@cumin1003: conftool action : set/pooled=yes; selector: name=cirrussearch1122.eqiad.wmnet
- 15:15 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:15 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:14 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:13 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:13 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:10 XioNoX: lsw1-d7-eqiad# tools network-instance default protocols bgp neighbor 10.64.128.17 reset-peer - T418772
- 15:10 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
- 15:09 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1218.eqiad.wmnet
- 15:09 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1217.eqiad.wmnet
- 15:09 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:08 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:08 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:05 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:05 moritzm: upgrading cloudlb* to Bird 2.18 T413740
- 15:05 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:58 Dreamy_Jazz: Afternoon UTC backport window done
- 14:58 dreamyjazz@deploy2002: Finished scap sync-world: Backport for zhwiki: Remove all rights from accountcreator (T418089) (duration: 08m 12s)
- 14:57 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 14:57 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1217.eqiad.wmnet
- 14:57 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1216.eqiad.wmnet
- 14:57 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 14:56 btullis@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on dse-k8s-worker[1010-1011,1013,1018-1019].eqiad.wmnet with reason: Adding 10 Gbps NIC
- 14:54 dreamyjazz@deploy2002: dreamyjazz, 1f616emo: Continuing with sync
- 14:52 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
- 14:52 dreamyjazz@deploy2002: dreamyjazz, 1f616emo: Backport for zhwiki: Remove all rights from accountcreator (T418089) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:50 dreamyjazz@deploy2002: Started scap sync-world: Backport for zhwiki: Remove all rights from accountcreator (T418089)
- 14:45 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1216.eqiad.wmnet
- 14:45 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1215.eqiad.wmnet
- 14:44 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Hooks: Fix liquidthreads log type definition bugs (T417425 T419006), Define $wgWikimediaMessagesHasLiquidThreadsLogs (T417425) (duration: 07m 11s)
- 14:44 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp1115.eqiad.wmnet [reason: T418772 - BGP maintenance]
- 14:44 taavi: updating CR firewall policy with https://gerrit.wikimedia.org/r/c/operations/homer/public/+/970275
- 14:42 ayounsi@cumin1003: conftool action : set/pooled=no; selector: name=cirrussearch1122.eqiad.wmnet
- 14:42 ayounsi@cumin1003: conftool action : set/pooled=no; selector: name=cirrussearch1121.eqiad.wmnet
- 14:42 ayounsi@cumin1003: conftool action : set/pooled=no; selector: name=cirrussearch1120.eqiad.wmnet
- 14:40 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 14:39 dreamyjazz@deploy2002: dreamyjazz: Backport for Hooks: Fix liquidthreads log type definition bugs (T417425 T419006), Define $wgWikimediaMessagesHasLiquidThreadsLogs (T417425) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:37 dreamyjazz@deploy2002: Started scap sync-world: Backport for Hooks: Fix liquidthreads log type definition bugs (T417425 T419006), Define $wgWikimediaMessagesHasLiquidThreadsLogs (T417425)
- 14:33 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1215.eqiad.wmnet
- 14:33 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1214.eqiad.wmnet
- 14:32 btullis@puppetserver1001: conftool action : get/pooled=no; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1028.eqiad.wmnet
- 14:31 btullis@puppetserver1001: conftool action : set/pooled=yes; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1028.eqiad.wmnet
- 14:31 btullis@puppetserver1001: conftool action : set/pooled=yes; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1025.eqiad.wmnet
- 14:31 btullis@puppetserver1001: conftool action : set/pooled=yes; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1024.eqiad.wmnet
- 14:31 btullis@puppetserver1001: conftool action : set/weight=1; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1028.eqiad.wmnet
- 14:31 btullis@puppetserver1001: conftool action : set/weight=1; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1025.eqiad.wmnet
- 14:31 btullis@puppetserver1001: conftool action : set/weight=1; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1024.eqiad.wmnet
- 14:30 btullis@puppetserver1001: conftool action : get/pooled; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1024.eqiad.wmnet
- 14:29 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 3.0 upgrade ()
- 14:27 arnaudb@dns1004: END - running authdns-update
- 14:26 arnaudb@dns1004: START - running authdns-update
- 14:26 tgr@deploy2002: Finished scap sync-world: Backport for Revert "Enable JWT session cookie for bot passwords (all wikis) (attempt #2)" (T415007 T418999) (duration: 07m 19s)
- 14:22 tgr@deploy2002: tgr: Continuing with sync
- 14:21 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1214.eqiad.wmnet
- 14:21 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1213.eqiad.wmnet
- 14:21 tgr@deploy2002: tgr: Backport for Revert "Enable JWT session cookie for bot passwords (all wikis) (attempt #2)" (T415007 T418999) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:19 tgr@deploy2002: Started scap sync-world: Backport for Revert "Enable JWT session cookie for bot passwords (all wikis) (attempt #2)" (T415007 T418999)
- 14:14 sgimeno@deploy2002: Finished scap sync-world: Backport for Enable new HTML confirmation emails for all (T416748) (duration: 07m 46s)
- 14:13 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 14:13 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 14:10 sgimeno@deploy2002: migr, sgimeno: Continuing with sync
- 14:09 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1213.eqiad.wmnet
- 14:09 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1212.eqiad.wmnet
- 14:09 sgimeno@deploy2002: migr, sgimeno: Backport for Enable new HTML confirmation emails for all (T416748) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:08 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 14:08 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 14:08 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:07 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:07 sgimeno@deploy2002: Started scap sync-world: Backport for Enable new HTML confirmation emails for all (T416748)
- 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
- 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
- 13:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
- 13:57 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1212.eqiad.wmnet
- 13:57 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1211.eqiad.wmnet
- 13:49 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp - 3.0 upgrade ()
- 13:45 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1211.eqiad.wmnet
- 13:45 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1210.eqiad.wmnet
- 13:43 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 3.0 upgrade ()
- 13:40 arnaudb@dns1004: END - running authdns-update
- 13:39 arnaudb@dns1004: START - running authdns-update
- 13:37 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 3.0 upgrade ()
- 13:33 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1210.eqiad.wmnet
- 13:33 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1209.eqiad.wmnet
- 13:20 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1209.eqiad.wmnet
- 13:20 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1208.eqiad.wmnet
- 13:17 aokoth@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 13:17 aokoth@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 13:16 aokoth@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:15 aokoth@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:06 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1208.eqiad.wmnet
- 13:06 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1207.eqiad.wmnet
- 13:03 arnaudb@dns1005: END - running authdns-update
- 13:02 arnaudb@dns1005: START - running authdns-update
- 13:00 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp - 3.0 upgrade ()
- 13:00 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 3.0 upgrade ()
- 12:46 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 12:45 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 12:44 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 12:44 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 12:43 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 12:43 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 12:33 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 12:29 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 12:10 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 3.0 upgrade ()
- 12:08 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 3.0 upgrade ()
- 12:03 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1207.eqiad.wmnet
- 12:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1206.eqiad.wmnet
- 11:53 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1206.eqiad.wmnet
- 11:53 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1205.eqiad.wmnet
- 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f8-eqiad
- 11:36 jmm@cumin2002: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
- 11:34 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp - 3.0 upgrade ()
- 11:34 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp - 3.0 upgrade ()
- 11:28 dreamyjazz@deploy2002: Finished scap sync-world: Backport for SI: Update instrumentation schema (T418293) (duration: 16m 22s)
- 11:22 fabfur: start upgrading haproxy to 3.0 on A:cp-eqiad (T417253)
- 11:22 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 11:17 dreamyjazz@deploy2002: dreamyjazz: Backport for SI: Update instrumentation schema (T418293) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:13 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 3.0 upgrade ()
- 11:12 dreamyjazz@deploy2002: Started scap sync-world: Backport for SI: Update instrumentation schema (T418293)
- 11:08 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 3.0 upgrade ()
- 11:07 blake@cumin1003: END (FAIL) - Cookbook sre.k8s.roll-reimage-nodes (exit_code=1) rolling reimage on P{wikikube-worker[2332-2356].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 11:07 blake@cumin1003: START - Cookbook sre.k8s.roll-reimage-nodes rolling reimage on P{wikikube-worker[2332-2356].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 11:06 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2332-2356].codfw.wmnet
- 11:06 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2332-2356].codfw.wmnet
- 11:03 blake@cumin1003: END (FAIL) - Cookbook sre.k8s.roll-reimage-nodes (exit_code=1) rolling reimage on P{wikikube-worker[2332-2356].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 11:03 blake@cumin1003: START - Cookbook sre.k8s.roll-reimage-nodes rolling reimage on P{wikikube-worker[2332-2356].codfw.wmnet} and (A:wikikube-master-codfw or A:wikikube-worker-codfw)
- 10:55 blake@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2332-2356].codfw.wmnet
- 10:55 blake@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2332-2356].codfw.wmnet
- 10:54 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1205.eqiad.wmnet
- 10:54 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1204.eqiad.wmnet
- 10:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 10:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 10:42 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1204.eqiad.wmnet
- 10:42 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1203.eqiad.wmnet
- 10:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 10:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 10:28 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1203.eqiad.wmnet
- 10:28 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1202.eqiad.wmnet
- 10:25 fabfur: start upgrading haproxy to 3.0 on A:cp-drmrs (T417253)
- 10:25 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp - 3.0 upgrade ()
- 10:25 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp - 3.0 upgrade ()
- 10:24 ladsgroup@deploy2002: Finished scap sync-world: Backport for WebPHandler: Allow the original being served on the web (T414805 T418745 T418346), WebPHandler: Allow the original being served on the web (T414805 T418745 T418346) (duration: 06m 42s)
- 10:22 arnaudb@dns1004: END - running authdns-update
- 10:20 arnaudb@dns1004: START - running authdns-update
- 10:20 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 10:20 ladsgroup@deploy2002: ladsgroup: Backport for WebPHandler: Allow the original being served on the web (T414805 T418745 T418346), WebPHandler: Allow the original being served on the web (T414805 T418745 T418346) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 10:18 ladsgroup@deploy2002: Started scap sync-world: Backport for WebPHandler: Allow the original being served on the web (T414805 T418745 T418346), WebPHandler: Allow the original being served on the web (T414805 T418745 T418346)
- 10:16 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1202.eqiad.wmnet
- 10:16 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1201.eqiad.wmnet
- 10:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 10:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 10:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 10:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 10:04 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1201.eqiad.wmnet
- 10:04 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1200.eqiad.wmnet
- 09:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 09:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 09:50 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1200.eqiad.wmnet
- 09:39 mszwarc@deploy2002: Finished scap sync-world: Backport for Require 2FA from CentralNotice admins and WMF Trust & Safety (T418580 T417880) (duration: 08m 23s)
- 09:36 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 3.0 upgrade ()
- 09:35 mszwarc@deploy2002: mszwarc: Continuing with sync
- 09:33 mszwarc@deploy2002: mszwarc: Backport for Require 2FA from CentralNotice admins and WMF Trust & Safety (T418580 T417880) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts pki-root1002.eqiad.wmnet
- 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
- 09:31 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 3.0 upgrade ()
- 09:31 mszwarc@deploy2002: Started scap sync-world: Backport for Require 2FA from CentralNotice admins and WMF Trust & Safety (T418580 T417880)
- 09:30 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
- 09:20 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki-root1002.eqiad.wmnet
- 09:20 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts pki-root1002.eqiad.wmnet
- 09:20 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki-root1002.eqiad.wmnet
- 09:20 jmm@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts pki-root1002.eqiad.wmnet
- 09:03 gehel: switching off Blazegraph on wdqs2009 (legacy full graph endpoint is end of life) - T411410 / T415073
- 09:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 09:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 09:02 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1096.eqiad.wmnet
- 09:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 09:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1097.eqiad.wmnet
- 08:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 08:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 08:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 08:56 mvernon@cumin1003: START - Cookbook sre.hosts.reboot-single for host ms-be1096.eqiad.wmnet
- 08:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1097.eqiad.wmnet
- 08:52 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts pki-root1002.eqiad.wmnet
- 08:49 topranks: disabling IBGP session between ssw1-d1-eqiad and ssw1-d8-eqiad to remove backup paths try #2 T411054
- 08:36 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on backup1007.eqiad.wmnet,dbprov1004.eqiad.wmnet with reason: network maintenance
- 08:32 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 08:31 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 08:21 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw and A:cp - 3.0 upgrade ()
- 08:21 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and A:cp - 3.0 upgrade ()
- 08:11 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp5032.*
- 07:54 topranks: disabling IBGP session between ssw1-d1-eqiad and ssw1-d8-eqiad to remove backup paths T411054
- 07:43 moritzm: installing libbpf updates from Bookworm point release
- 05:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2179.codfw.wmnet with reason: Maintenance
- 05:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 02:08 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 04s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 01:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1263 (T418465)', diff saved to https://phabricator.wikimedia.org/P89793 and previous config saved to /var/cache/conftool/dbconfig/20260304-015657-marostegui.json
- 01:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1263', diff saved to https://phabricator.wikimedia.org/P89792 and previous config saved to /var/cache/conftool/dbconfig/20260304-014150-marostegui.json
- 01:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1263', diff saved to https://phabricator.wikimedia.org/P89791 and previous config saved to /var/cache/conftool/dbconfig/20260304-012642-marostegui.json
- 01:23 zabe@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 01:22 zabe@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 01:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1263 (T418465)', diff saved to https://phabricator.wikimedia.org/P89790 and previous config saved to /var/cache/conftool/dbconfig/20260304-011134-marostegui.json
- 00:46 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1263 (T418465)', diff saved to https://phabricator.wikimedia.org/P89789 and previous config saved to /var/cache/conftool/dbconfig/20260304-004638-marostegui.json
- 00:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1263.eqiad.wmnet with reason: Maintenance
- 00:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1262 (T418465)', diff saved to https://phabricator.wikimedia.org/P89788 and previous config saved to /var/cache/conftool/dbconfig/20260304-004615-marostegui.json
- 00:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1262', diff saved to https://phabricator.wikimedia.org/P89787 and previous config saved to /var/cache/conftool/dbconfig/20260304-003107-marostegui.json
- 00:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1262', diff saved to https://phabricator.wikimedia.org/P89786 and previous config saved to /var/cache/conftool/dbconfig/20260304-001559-marostegui.json
- 00:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1262 (T418465)', diff saved to https://phabricator.wikimedia.org/P89785 and previous config saved to /var/cache/conftool/dbconfig/20260304-000052-marostegui.json
2026-03-03
- 23:35 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1262 (T418465)', diff saved to https://phabricator.wikimedia.org/P89784 and previous config saved to /var/cache/conftool/dbconfig/20260303-233500-marostegui.json
- 23:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1262.eqiad.wmnet with reason: Maintenance
- 23:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1261 (T418465)', diff saved to https://phabricator.wikimedia.org/P89783 and previous config saved to /var/cache/conftool/dbconfig/20260303-233436-marostegui.json
- 23:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1261', diff saved to https://phabricator.wikimedia.org/P89782 and previous config saved to /var/cache/conftool/dbconfig/20260303-231929-marostegui.json
- 23:10 rzl@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 23:08 rzl@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 23:08 rzl@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 23:07 rzl@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 23:05 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 23:05 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 23:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1261', diff saved to https://phabricator.wikimedia.org/P89781 and previous config saved to /var/cache/conftool/dbconfig/20260303-230421-marostegui.json
- 23:04 bking@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host dse-k8s-worker1028.eqiad.wmnet
- 23:02 tgr@deploy2002: Finished scap sync-world: Backport for Do not invalidate anon sessions with non-anon JWT cookies (T415007), Do not invalidate anon sessions with non-anon JWT cookies (T415007), Enable JWT session cookie for bot passwords (all wikis) (attempt #2) (T415007) (duration: 21m 47s)
- 23:00 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp7008.magru.wmnet [reason: lldpd packet drop issues]
- 22:58 cdobbins@cumin2002: conftool action : set/pooled=yes; selector: name=cp7008 [reason: lldpd packet drop issues]
- 22:58 tgr@deploy2002: tgr: Continuing with sync
- 22:56 bking@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host dse-k8s-worker1028.eqiad.wmnet
- 22:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1261 (T418465)', diff saved to https://phabricator.wikimedia.org/P89780 and previous config saved to /var/cache/conftool/dbconfig/20260303-224913-marostegui.json
- 22:45 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:45 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:44 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:44 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:42 tgr@deploy2002: tgr: Backport for Do not invalidate anon sessions with non-anon JWT cookies (T415007), Do not invalidate anon sessions with non-anon JWT cookies (T415007), Enable JWT session cookie for bot passwords (all wikis) (attempt #2) (T415007) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:40 tgr@deploy2002: Started scap sync-world: Backport for Do not invalidate anon sessions with non-anon JWT cookies (T415007), Do not invalidate anon sessions with non-anon JWT cookies (T415007), Enable JWT session cookie for bot passwords (all wikis) (attempt #2) (T415007)
- 22:26 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 22:26 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 22:23 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1261 (T418465)', diff saved to https://phabricator.wikimedia.org/P89779 and previous config saved to /var/cache/conftool/dbconfig/20260303-222324-marostegui.json
- 22:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1261.eqiad.wmnet with reason: Maintenance
- 22:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1260 (T418465)', diff saved to https://phabricator.wikimedia.org/P89778 and previous config saved to /var/cache/conftool/dbconfig/20260303-222301-marostegui.json
- 22:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1260', diff saved to https://phabricator.wikimedia.org/P89777 and previous config saved to /var/cache/conftool/dbconfig/20260303-220754-marostegui.json
- 21:59 rzl@deploy2002: Finished scap sync-world: https://gerrit.wikimedia.org/r/1245162 T411807 (duration: 12m 15s)
- 21:58 rzl@deploy2002: rzl: Continuing with sync
- 21:56 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 21:56 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 21:55 rzl@deploy2002: rzl: https://gerrit.wikimedia.org/r/1245162 T411807 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:54 rzl@deploy2002: Started scap sync-world: https://gerrit.wikimedia.org/r/1245162 T411807
- 21:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1260', diff saved to https://phabricator.wikimedia.org/P89776 and previous config saved to /var/cache/conftool/dbconfig/20260303-215247-marostegui.json
- 21:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2248 (T418465)', diff saved to https://phabricator.wikimedia.org/P89775 and previous config saved to /var/cache/conftool/dbconfig/20260303-214931-marostegui.json
- 21:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2045.codfw.wmnet
- 21:48 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp2045.codfw.wmnet
- 21:40 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 21:39 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 21:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1260 (T418465)', diff saved to https://phabricator.wikimedia.org/P89774 and previous config saved to /var/cache/conftool/dbconfig/20260303-213739-marostegui.json
- 21:35 jhuneidi@deploy2002: Finished scap sync-world: Backport for REST: show the beta Attribution API in the REST Sandbox (T418522) (duration: 07m 41s)
- 21:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2248', diff saved to https://phabricator.wikimedia.org/P89773 and previous config saved to /var/cache/conftool/dbconfig/20260303-213423-marostegui.json
- 21:32 jhuneidi@deploy2002: jhuneidi, bpirkle: Continuing with sync
- 21:30 jhuneidi@deploy2002: jhuneidi, bpirkle: Backport for REST: show the beta Attribution API in the REST Sandbox (T418522) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:28 jhuneidi@deploy2002: Started scap sync-world: Backport for REST: show the beta Attribution API in the REST Sandbox (T418522)
- 21:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2248', diff saved to https://phabricator.wikimedia.org/P89772 and previous config saved to /var/cache/conftool/dbconfig/20260303-211915-marostegui.json
- 21:18 jhuneidi@deploy2002: Finished scap sync-world: Backport for Remove redundant mw-extra wgRestSandboxSpecs entry (duration: 06m 56s)
- 21:14 jhuneidi@deploy2002: jhuneidi, aaron: Continuing with sync
- 21:13 jhuneidi@deploy2002: jhuneidi, aaron: Backport for Remove redundant mw-extra wgRestSandboxSpecs entry synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:11 jhuneidi@deploy2002: Started scap sync-world: Backport for Remove redundant mw-extra wgRestSandboxSpecs entry
- 21:10 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1260 (T418465)', diff saved to https://phabricator.wikimedia.org/P89771 and previous config saved to /var/cache/conftool/dbconfig/20260303-211033-marostegui.json
- 21:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1260.eqiad.wmnet with reason: Maintenance
- 21:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T418465)', diff saved to https://phabricator.wikimedia.org/P89770 and previous config saved to /var/cache/conftool/dbconfig/20260303-211009-marostegui.json
- 21:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2248 (T418465)', diff saved to https://phabricator.wikimedia.org/P89769 and previous config saved to /var/cache/conftool/dbconfig/20260303-210407-marostegui.json
- 20:58 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2045.codfw.wmnet with reason: troubleshooting for T418527
- 20:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89768 and previous config saved to /var/cache/conftool/dbconfig/20260303-205502-marostegui.json
- 20:51 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7008.magru.wmnet with OS trixie
- 20:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2248 (T418465)', diff saved to https://phabricator.wikimedia.org/P89767 and previous config saved to /var/cache/conftool/dbconfig/20260303-204452-marostegui.json
- 20:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2248.codfw.wmnet with reason: Maintenance
- 20:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 (T418465)', diff saved to https://phabricator.wikimedia.org/P89766 and previous config saved to /var/cache/conftool/dbconfig/20260303-204439-marostegui.json
- 20:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89765 and previous config saved to /var/cache/conftool/dbconfig/20260303-203954-marostegui.json
- 20:34 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 20:34 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 20:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P89764 and previous config saved to /var/cache/conftool/dbconfig/20260303-202931-marostegui.json
- 20:24 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7008.magru.wmnet with reason: host reimage
- 20:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T418465)', diff saved to https://phabricator.wikimedia.org/P89763 and previous config saved to /var/cache/conftool/dbconfig/20260303-202447-marostegui.json
- 20:17 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7008.magru.wmnet with reason: host reimage
- 20:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P89762 and previous config saved to /var/cache/conftool/dbconfig/20260303-201423-marostegui.json
- 20:10 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1199.eqiad.wmnet
- 19:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 (T418465)', diff saved to https://phabricator.wikimedia.org/P89761 and previous config saved to /var/cache/conftool/dbconfig/20260303-195916-marostegui.json
- 19:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1252 (T418465)', diff saved to https://phabricator.wikimedia.org/P89760 and previous config saved to /var/cache/conftool/dbconfig/20260303-195900-marostegui.json
- 19:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1252.eqiad.wmnet with reason: Maintenance
- 19:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T418465)', diff saved to https://phabricator.wikimedia.org/P89759 and previous config saved to /var/cache/conftool/dbconfig/20260303-195835-marostegui.json
- 19:51 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS trixie
- 19:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P89758 and previous config saved to /var/cache/conftool/dbconfig/20260303-194327-marostegui.json
- 19:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2043.codfw.wmnet
- 19:42 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for cp2043.codfw.wmnet
- 19:33 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2247 (T418465)', diff saved to https://phabricator.wikimedia.org/P89757 and previous config saved to /var/cache/conftool/dbconfig/20260303-193351-marostegui.json
- 19:33 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2247.codfw.wmnet with reason: Maintenance
- 19:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 (T418465)', diff saved to https://phabricator.wikimedia.org/P89756 and previous config saved to /var/cache/conftool/dbconfig/20260303-193338-marostegui.json
- 19:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P89755 and previous config saved to /var/cache/conftool/dbconfig/20260303-192820-marostegui.json
- 19:19 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.18 refs T413809
- 19:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P89754 and previous config saved to /var/cache/conftool/dbconfig/20260303-191830-marostegui.json
- 19:17 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2047.codfw.wmnet with OS trixie
- 19:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T418465)', diff saved to https://phabricator.wikimedia.org/P89753 and previous config saved to /var/cache/conftool/dbconfig/20260303-191312-marostegui.json
- 19:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P89752 and previous config saved to /var/cache/conftool/dbconfig/20260303-190323-marostegui.json
- 18:53 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2047.codfw.wmnet with reason: host reimage
- 18:49 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1199.eqiad.wmnet
- 18:49 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1198.eqiad.wmnet
- 18:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1249 (T418465)', diff saved to https://phabricator.wikimedia.org/P89751 and previous config saved to /var/cache/conftool/dbconfig/20260303-184937-marostegui.json
- 18:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 18:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T418465)', diff saved to https://phabricator.wikimedia.org/P89750 and previous config saved to /var/cache/conftool/dbconfig/20260303-184913-marostegui.json
- 18:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 (T418465)', diff saved to https://phabricator.wikimedia.org/P89749 and previous config saved to /var/cache/conftool/dbconfig/20260303-184815-marostegui.json
- 18:47 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2047.codfw.wmnet with reason: host reimage
- 18:45 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1096.eqiad.wmnet with OS bullseye
- 18:36 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1198.eqiad.wmnet
- 18:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1197.eqiad.wmnet
- 18:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P89747 and previous config saved to /var/cache/conftool/dbconfig/20260303-183406-marostegui.json
- 18:29 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2047.codfw.wmnet with OS trixie
- 18:24 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1197.eqiad.wmnet
- 18:24 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1196.eqiad.wmnet
- 18:23 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2246 (T418465)', diff saved to https://phabricator.wikimedia.org/P89746 and previous config saved to /var/cache/conftool/dbconfig/20260303-182346-marostegui.json
- 18:23 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1096.eqiad.wmnet with reason: host reimage
- 18:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2246.codfw.wmnet with reason: Maintenance
- 18:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 (T418465)', diff saved to https://phabricator.wikimedia.org/P89745 and previous config saved to /var/cache/conftool/dbconfig/20260303-182321-marostegui.json
- 18:19 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1096.eqiad.wmnet with reason: host reimage
- 18:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P89744 and previous config saved to /var/cache/conftool/dbconfig/20260303-181859-marostegui.json
- 18:11 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1196.eqiad.wmnet
- 18:11 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1195.eqiad.wmnet
- 18:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P89743 and previous config saved to /var/cache/conftool/dbconfig/20260303-180814-marostegui.json
- 18:04 jforrester@deploy2002: Finished scap sync-world: Backport for Style fixes for copy-paste feature (T414072) (duration: 32m 54s)
- 18:04 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 18:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T418465)', diff saved to https://phabricator.wikimedia.org/P89742 and previous config saved to /var/cache/conftool/dbconfig/20260303-180352-marostegui.json
- 18:02 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1096.eqiad.wmnet with OS bullseye
- 18:02 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 17:59 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1195.eqiad.wmnet
- 17:59 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-worker1194.eqiad.wmnet
- 17:55 ariel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 17:53 ariel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 17:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P89741 and previous config saved to /var/cache/conftool/dbconfig/20260303-175304-marostegui.json
- 17:52 jforrester@deploy2002: jforrester: Continuing with sync
- 17:51 jforrester@deploy2002: jforrester: Backport for Style fixes for copy-paste feature (T414072) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:47 ariel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 17:46 ariel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 17:41 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1194.eqiad.wmnet
- 17:41 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1193.eqiad.wmnet
- 17:39 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1248 (T418465)', diff saved to https://phabricator.wikimedia.org/P89740 and previous config saved to /var/cache/conftool/dbconfig/20260303-173914-marostegui.json
- 17:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 17:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T418465)', diff saved to https://phabricator.wikimedia.org/P89739 and previous config saved to /var/cache/conftool/dbconfig/20260303-173850-marostegui.json
- 17:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 (T418465)', diff saved to https://phabricator.wikimedia.org/P89738 and previous config saved to /var/cache/conftool/dbconfig/20260303-173756-marostegui.json
- 17:31 jforrester@deploy2002: Started scap sync-world: Backport for Style fixes for copy-paste feature (T414072)
- 17:30 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1193.eqiad.wmnet
- 17:30 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1192.eqiad.wmnet
- 17:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P89736 and previous config saved to /var/cache/conftool/dbconfig/20260303-172343-marostegui.json
- 17:18 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1192.eqiad.wmnet
- 17:18 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1191.eqiad.wmnet
- 17:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2245 (T418465)', diff saved to https://phabricator.wikimedia.org/P89735 and previous config saved to /var/cache/conftool/dbconfig/20260303-171149-marostegui.json
- 17:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2245.codfw.wmnet with reason: Maintenance
- 17:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T418465)', diff saved to https://phabricator.wikimedia.org/P89734 and previous config saved to /var/cache/conftool/dbconfig/20260303-171126-marostegui.json
- 17:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P89733 and previous config saved to /var/cache/conftool/dbconfig/20260303-170835-marostegui.json
- 17:06 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1191.eqiad.wmnet
- 17:06 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1190.eqiad.wmnet
- 16:56 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1190.eqiad.wmnet
- 16:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P89732 and previous config saved to /var/cache/conftool/dbconfig/20260303-165618-marostegui.json
- 16:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T418465)', diff saved to https://phabricator.wikimedia.org/P89731 and previous config saved to /var/cache/conftool/dbconfig/20260303-165327-marostegui.json
- 16:41 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1189.eqiad.wmnet
- 16:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P89730 and previous config saved to /var/cache/conftool/dbconfig/20260303-164111-marostegui.json
- 16:34 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:30 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1189.eqiad.wmnet
- 16:30 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1188.eqiad.wmnet
- 16:28 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1247 (T418465)', diff saved to https://phabricator.wikimedia.org/P89729 and previous config saved to /var/cache/conftool/dbconfig/20260303-162845-marostegui.json
- 16:28 fceratto@cumin1003: dbctl commit (dc=all): 'Setting x1 codfw weights to 300 T416705', diff saved to https://phabricator.wikimedia.org/P89728 and previous config saved to /var/cache/conftool/dbconfig/20260303-162836-fceratto.json
- 16:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 16:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T418465)', diff saved to https://phabricator.wikimedia.org/P89727 and previous config saved to /var/cache/conftool/dbconfig/20260303-162603-marostegui.json
- 16:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 16:18 fceratto@cumin1003: dbctl commit (dc=all): 'Setting db1188 weight to 100 T416705', diff saved to https://phabricator.wikimedia.org/P89726 and previous config saved to /var/cache/conftool/dbconfig/20260303-161846-fceratto.json
- 16:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 16:17 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1188.eqiad.wmnet
- 16:17 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1187.eqiad.wmnet
- 16:14 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1166: testing:crash
- 16:14 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1166: testing:crash
- 16:13 fceratto@cumin1003: dbctl commit (dc=all): 'Setting db1169 weight to 300 T416705', diff saved to https://phabricator.wikimedia.org/P89724 and previous config saved to /var/cache/conftool/dbconfig/20260303-161323-fceratto.json
- 16:12 fceratto@cumin1003: dbctl commit (dc=all): 'Setting db1188 weight to 300 T416705', diff saved to https://phabricator.wikimedia.org/P89723 and previous config saved to /var/cache/conftool/dbconfig/20260303-161230-fceratto.json
- 16:07 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 16:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T418465)', diff saved to https://phabricator.wikimedia.org/P89722 and previous config saved to /var/cache/conftool/dbconfig/20260303-160720-marostegui.json
- 16:07 brennen@deploy2002: Finished deploy [phabricator/deployment@a883b6d]: deploy phab1004 for T418872 (duration: 01m 07s)
- 16:06 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1187.eqiad.wmnet
- 16:06 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1186.eqiad.wmnet
- 16:05 brennen@deploy2002: Started deploy [phabricator/deployment@a883b6d]: deploy phab1004 for T418872
- 16:05 brennen@deploy2002: Finished deploy [phabricator/deployment@a883b6d]: deploy phab2002 for T418872 (duration: 00m 32s)
- 16:04 brennen@deploy2002: Started deploy [phabricator/deployment@a883b6d]: deploy phab2002 for T418872
- 16:02 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2240 (T418465)', diff saved to https://phabricator.wikimedia.org/P89721 and previous config saved to /var/cache/conftool/dbconfig/20260303-160207-marostegui.json
- 16:02 jelto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deploy
- 16:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2240.codfw.wmnet with reason: Maintenance
- 16:01 jelto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator deploy
- 16:00 zabe@deploy2002: Finished scap sync-world: Backport for ImageListPager: Use correct name field for batch lookups (T418327) (duration: 09m 28s)
- 15:54 zabe@deploy2002: zabe: Continuing with sync
- 15:54 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1186.eqiad.wmnet
- 15:54 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1185.eqiad.wmnet
- 15:54 zabe@deploy2002: zabe: Backport for ImageListPager: Use correct name field for batch lookups (T418327) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:53 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp - 3.0 upgrade ()
- 15:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P89720 and previous config saved to /var/cache/conftool/dbconfig/20260303-155212-marostegui.json
- 15:50 zabe@deploy2002: Started scap sync-world: Backport for ImageListPager: Use correct name field for batch lookups (T418327)
- 15:49 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp - 3.0 upgrade ()
- 15:45 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 15:44 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 15:42 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1185.eqiad.wmnet
- 15:42 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1184.eqiad.wmnet
- 15:42 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:41 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 15:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2239.codfw.wmnet with reason: Maintenance
- 15:41 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 15:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T418465)', diff saved to https://phabricator.wikimedia.org/P89719 and previous config saved to /var/cache/conftool/dbconfig/20260303-154104-marostegui.json
- 15:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P89718 and previous config saved to /var/cache/conftool/dbconfig/20260303-153704-marostegui.json
- 15:36 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1352-1359].eqiad.wmnet
- 15:36 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1352-1359].eqiad.wmnet
- 15:30 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1184.eqiad.wmnet
- 15:30 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1183.eqiad.wmnet
- 15:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P89717 and previous config saved to /var/cache/conftool/dbconfig/20260303-152557-marostegui.json
- 15:23 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1178.eqiad.wmnet
- 15:22 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5032.*} and A:cp - 3.0 upgrade ()
- 15:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T418465)', diff saved to https://phabricator.wikimedia.org/P89716 and previous config saved to /var/cache/conftool/dbconfig/20260303-152157-marostegui.json
- 15:19 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1183.eqiad.wmnet
- 15:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1182.eqiad.wmnet
- 15:16 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5032.*} and A:cp - 3.0 upgrade ()
- 15:15 fabfur@cumin1003: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp - 3.0 upgrade ()
- 15:14 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp - 3.0 upgrade ()
- 15:14 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp - 3.0 upgrade ()
- 15:13 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp - 3.0 upgrade ()
- 15:13 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1178.eqiad.wmnet
- 15:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P89715 and previous config saved to /var/cache/conftool/dbconfig/20260303-151049-marostegui.json
- 15:08 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:07 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1182.eqiad.wmnet
- 15:07 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1181.eqiad.wmnet
- 14:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1244 (T418465)', diff saved to https://phabricator.wikimedia.org/P89714 and previous config saved to /var/cache/conftool/dbconfig/20260303-145727-marostegui.json
- 14:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1244.eqiad.wmnet with reason: Maintenance
- 14:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T418465)', diff saved to https://phabricator.wikimedia.org/P89713 and previous config saved to /var/cache/conftool/dbconfig/20260303-145704-marostegui.json
- 14:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T418465)', diff saved to https://phabricator.wikimedia.org/P89712 and previous config saved to /var/cache/conftool/dbconfig/20260303-145541-marostegui.json
- 14:55 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1181.eqiad.wmnet
- 14:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1180.eqiad.wmnet
- 14:49 moritzm: installing php7.4 security updates
- 14:46 jayme@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) pool for host wikikube-worker[1352-1359].eqiad.wmnet
- 14:46 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1352-1359].eqiad.wmnet
- 14:43 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1180.eqiad.wmnet
- 14:43 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1179.eqiad.wmnet
- 14:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P89711 and previous config saved to /var/cache/conftool/dbconfig/20260303-144156-marostegui.json
- 14:38 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:38 esanders@deploy2002: Finished scap sync-world: Backport for Remove Editing-related config for special wikis (T400063) (duration: 06m 34s)
- 14:36 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:34 esanders@deploy2002: esanders: Continuing with sync
- 14:34 esanders@deploy2002: esanders: Backport for Remove Editing-related config for special wikis (T400063) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:34 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:32 esanders@deploy2002: Started scap sync-world: Backport for Remove Editing-related config for special wikis (T400063)
- 14:31 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1179.eqiad.wmnet
- 14:31 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1178.eqiad.wmnet
- 14:31 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2237 (T418465)', diff saved to https://phabricator.wikimedia.org/P89710 and previous config saved to /var/cache/conftool/dbconfig/20260303-143141-marostegui.json
- 14:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2237.codfw.wmnet with reason: Maintenance
- 14:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T418465)', diff saved to https://phabricator.wikimedia.org/P89709 and previous config saved to /var/cache/conftool/dbconfig/20260303-143117-marostegui.json
- 14:29 esanders@deploy2002: Finished scap sync-world: Backport for PasteCheck: Enable by default (T405127) (duration: 08m 01s)
- 14:27 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp - 3.0 upgrade ()
- 14:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1097.eqiad.wmnet
- 14:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P89708 and previous config saved to /var/cache/conftool/dbconfig/20260303-142649-marostegui.json
- 14:26 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp - 3.0 upgrade ()
- 14:25 esanders@deploy2002: esanders: Continuing with sync
- 14:23 esanders@deploy2002: esanders: Backport for PasteCheck: Enable by default (T405127) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:21 esanders@deploy2002: Started scap sync-world: Backport for PasteCheck: Enable by default (T405127)
- 14:20 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be1097.eqiad.wmnet
- 14:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P89707 and previous config saved to /var/cache/conftool/dbconfig/20260303-141610-marostegui.json
- 14:15 esanders@deploy2002: Finished scap sync-world: Backport for Enable Wikibase GraphQL on test.wikidata.org (T417619), Enable Wikibase GraphQL on production wikidata.org (T417619) (duration: 08m 17s)
- 14:11 esanders@deploy2002: esanders, jakob: Continuing with sync
- 14:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T418465)', diff saved to https://phabricator.wikimedia.org/P89706 and previous config saved to /var/cache/conftool/dbconfig/20260303-141142-marostegui.json
- 14:09 esanders@deploy2002: esanders, jakob: Backport for Enable Wikibase GraphQL on test.wikidata.org (T417619), Enable Wikibase GraphQL on production wikidata.org (T417619) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:07 esanders@deploy2002: Started scap sync-world: Backport for Enable Wikibase GraphQL on test.wikidata.org (T417619), Enable Wikibase GraphQL on production wikidata.org (T417619)
- 14:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P89704 and previous config saved to /var/cache/conftool/dbconfig/20260303-140102-marostegui.json
- 13:47 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1243 (T418465)', diff saved to https://phabricator.wikimedia.org/P89703 and previous config saved to /var/cache/conftool/dbconfig/20260303-134702-marostegui.json
- 13:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 13:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T418465)', diff saved to https://phabricator.wikimedia.org/P89702 and previous config saved to /var/cache/conftool/dbconfig/20260303-134639-marostegui.json
- 13:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T418465)', diff saved to https://phabricator.wikimedia.org/P89701 and previous config saved to /var/cache/conftool/dbconfig/20260303-134554-marostegui.json
- 13:31 moritzm: installing NSS security updates
- 13:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P89700 and previous config saved to /var/cache/conftool/dbconfig/20260303-133131-marostegui.json
- 13:24 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2236 (T418465)', diff saved to https://phabricator.wikimedia.org/P89699 and previous config saved to /var/cache/conftool/dbconfig/20260303-132414-marostegui.json
- 13:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2236.codfw.wmnet with reason: Maintenance
- 13:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T418465)', diff saved to https://phabricator.wikimedia.org/P89698 and previous config saved to /var/cache/conftool/dbconfig/20260303-132350-marostegui.json
- 13:20 tappof: Thanos: re-enable querier<->ruler cross-site traffic T412924
- 13:17 dpogorzelski@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad
- 13:17 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance
- 13:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P89697 and previous config saved to /var/cache/conftool/dbconfig/20260303-131624-marostegui.json
- 13:16 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance
- 13:11 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1178.eqiad.wmnet
- 13:11 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1177.eqiad.wmnet
- 13:10 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1359.eqiad.wmnet with OS trixie
- 13:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P89696 and previous config saved to /var/cache/conftool/dbconfig/20260303-130842-marostegui.json
- 13:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T418465)', diff saved to https://phabricator.wikimedia.org/P89695 and previous config saved to /var/cache/conftool/dbconfig/20260303-130117-marostegui.json
- 13:01 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 13:00 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1177.eqiad.wmnet
- 13:00 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 13:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1176.eqiad.wmnet
- 12:59 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1358.eqiad.wmnet with OS trixie
- 12:56 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:55 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:53 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1359.eqiad.wmnet with reason: host reimage
- 12:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P89694 and previous config saved to /var/cache/conftool/dbconfig/20260303-125335-marostegui.json
- 12:52 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1357.eqiad.wmnet with OS trixie
- 12:51 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:50 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:48 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1359.eqiad.wmnet with reason: host reimage
- 12:48 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1356.eqiad.wmnet with OS trixie
- 12:47 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:47 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 12:47 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:47 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 12:46 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 12:46 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 12:46 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1176.eqiad.wmnet
- 12:46 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1175.eqiad.wmnet
- 12:46 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 12:45 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:45 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 12:45 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 12:45 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:45 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 12:44 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 12:44 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 12:44 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:44 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:43 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 12:43 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 12:43 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:43 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 12:43 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1358.eqiad.wmnet with reason: host reimage
- 12:42 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 12:42 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 12:41 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 12:40 ladsgroup@deploy2002: Finished scap sync-world: Backport for Enable thumb steps on private wikis too (T414805) (duration: 13m 01s)
- 12:39 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 12:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T418465)', diff saved to https://phabricator.wikimedia.org/P89693 and previous config saved to /var/cache/conftool/dbconfig/20260303-123827-marostegui.json
- 12:36 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1357.eqiad.wmnet with reason: host reimage
- 12:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1242 (T418465)', diff saved to https://phabricator.wikimedia.org/P89692 and previous config saved to /var/cache/conftool/dbconfig/20260303-123642-marostegui.json
- 12:36 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1359.eqiad.wmnet with OS trixie
- 12:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 12:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T418465)', diff saved to https://phabricator.wikimedia.org/P89691 and previous config saved to /var/cache/conftool/dbconfig/20260303-123619-marostegui.json
- 12:34 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1175.eqiad.wmnet
- 12:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1174.eqiad.wmnet
- 12:34 dpogorzelski@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=recommendation-api,name=eqiad
- 12:33 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:33 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1356.eqiad.wmnet with reason: host reimage
- 12:31 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:31 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) depool all services in eqiad/ml-serve-eqiad: maintenance
- 12:31 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:31 ladsgroup@deploy2002: ladsgroup: Backport for Enable thumb steps on private wikis too (T414805) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:30 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster depool all services in eqiad/ml-serve-eqiad: maintenance
- 12:28 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1358.eqiad.wmnet with reason: host reimage
- 12:28 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1357.eqiad.wmnet with reason: host reimage
- 12:28 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1356.eqiad.wmnet with reason: host reimage
- 12:27 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:27 ladsgroup@deploy2002: Started scap sync-world: Backport for Enable thumb steps on private wikis too (T414805)
- 12:26 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:22 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1174.eqiad.wmnet
- 12:22 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1173.eqiad.wmnet
- 12:21 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P89690 and previous config saved to /var/cache/conftool/dbconfig/20260303-122112-marostegui.json
- 12:20 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:20 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:19 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1353.eqiad.wmnet with OS trixie
- 12:16 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1358.eqiad.wmnet with OS trixie
- 12:16 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1357.eqiad.wmnet with OS trixie
- 12:15 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1356.eqiad.wmnet with OS trixie
- 12:14 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1355.eqiad.wmnet with OS trixie
- 12:14 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2219 (T418465)', diff saved to https://phabricator.wikimedia.org/P89689 and previous config saved to /var/cache/conftool/dbconfig/20260303-121420-marostegui.json
- 12:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 12:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T418465)', diff saved to https://phabricator.wikimedia.org/P89688 and previous config saved to /var/cache/conftool/dbconfig/20260303-121355-marostegui.json
- 12:09 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1354.eqiad.wmnet with OS trixie
- 12:08 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1173.eqiad.wmnet
- 12:08 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1172.eqiad.wmnet
- 12:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P89687 and previous config saved to /var/cache/conftool/dbconfig/20260303-120604-marostegui.json
- 12:04 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1352.eqiad.wmnet with OS trixie
- 12:02 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1353.eqiad.wmnet with reason: host reimage
- 11:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P89686 and previous config saved to /var/cache/conftool/dbconfig/20260303-115847-marostegui.json
- 11:58 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1355.eqiad.wmnet with reason: host reimage
- 11:52 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1354.eqiad.wmnet with reason: host reimage
- 11:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T418465)', diff saved to https://phabricator.wikimedia.org/P89685 and previous config saved to /var/cache/conftool/dbconfig/20260303-115057-marostegui.json
- 11:48 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1352.eqiad.wmnet with reason: host reimage
- 11:44 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1355.eqiad.wmnet with reason: host reimage
- 11:43 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1354.eqiad.wmnet with reason: host reimage
- 11:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P89684 and previous config saved to /var/cache/conftool/dbconfig/20260303-114341-marostegui.json
- 11:43 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1353.eqiad.wmnet with reason: host reimage
- 11:42 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1352.eqiad.wmnet with reason: host reimage
- 11:40 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp - 3.0 upgrade ()
- 11:36 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp - 3.0 upgrade ()
- 11:31 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1355.eqiad.wmnet with OS trixie
- 11:31 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1354.eqiad.wmnet with OS trixie
- 11:30 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1353.eqiad.wmnet with OS trixie
- 11:30 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker1352.eqiad.wmnet with OS trixie
- 11:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T418465)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260303-112828-marostegui.json
- 11:25 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1241 (T418465)', diff saved to https://phabricator.wikimedia.org/P89683 and previous config saved to /var/cache/conftool/dbconfig/20260303-112535-marostegui.json
- 11:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 11:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T418465)', diff saved to https://phabricator.wikimedia.org/P89682 and previous config saved to /var/cache/conftool/dbconfig/20260303-112511-marostegui.json
- 11:21 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 11:18 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 11:18 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:17 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:17 jayme@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 11:16 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1350-1351].eqiad.wmnet
- 11:16 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1350-1351].eqiad.wmnet
- 11:15 jayme@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 11:15 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:15 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:15 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:14 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 11:14 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 11:13 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 11:13 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 11:13 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1172.eqiad.wmnet
- 11:13 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1171.eqiad.wmnet
- 11:13 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 11:13 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:12 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:11 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P89681 and previous config saved to /var/cache/conftool/dbconfig/20260303-111003-marostegui.json
- 11:09 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:08 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:08 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:07 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 11:07 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:06 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:05 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2210 (T418465)', diff saved to https://phabricator.wikimedia.org/P89680 and previous config saved to /var/cache/conftool/dbconfig/20260303-110551-marostegui.json
- 11:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 11:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T418465)', diff saved to https://phabricator.wikimedia.org/P89679 and previous config saved to /var/cache/conftool/dbconfig/20260303-110527-marostegui.json
- 10:59 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1171.eqiad.wmnet
- 10:59 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1170.eqiad.wmnet
- 10:57 slyngshede@dns1004: END - running authdns-update
- 10:55 slyngshede@dns1004: START - running authdns-update
- 10:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P89678 and previous config saved to /var/cache/conftool/dbconfig/20260303-105455-marostegui.json
- 10:54 hashar@deploy2002: Finished deploy [gerrit/gerrit@12177b1]: wm-checks-api: add tag for Selenium jobs (duration: 00m 13s)
- 10:54 hashar@deploy2002: Started deploy [gerrit/gerrit@12177b1]: wm-checks-api: add tag for Selenium jobs
- 10:51 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin and A:cp - 3.0 upgrade ()
- 10:51 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and A:cp - 3.0 upgrade ()
- 10:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P89677 and previous config saved to /var/cache/conftool/dbconfig/20260303-105020-marostegui.json
- 10:47 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 10:46 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1170.eqiad.wmnet
- 10:45 fabfur: start upgrading haproxy to 3.0 on A:cp-eqsin (T417253)
- 10:43 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 10:41 moritzm: installing Django security updates
- 10:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T418465)', diff saved to https://phabricator.wikimedia.org/P89676 and previous config saved to /var/cache/conftool/dbconfig/20260303-103947-marostegui.json
- 10:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P89675 and previous config saved to /var/cache/conftool/dbconfig/20260303-103512-marostegui.json
- 10:34 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 10:33 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 10:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 10:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 10:31 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 10:25 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 10:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T418465)', diff saved to https://phabricator.wikimedia.org/P89674 and previous config saved to /var/cache/conftool/dbconfig/20260303-102004-marostegui.json
- 10:18 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1238 (T418465)', diff saved to https://phabricator.wikimedia.org/P89673 and previous config saved to /var/cache/conftool/dbconfig/20260303-101800-marostegui.json
- 10:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 10:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T418465)', diff saved to https://phabricator.wikimedia.org/P89672 and previous config saved to /var/cache/conftool/dbconfig/20260303-101747-marostegui.json
- 09:57 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 09:56 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2206 (T418465)', diff saved to https://phabricator.wikimedia.org/P89670 and previous config saved to /var/cache/conftool/dbconfig/20260303-095655-marostegui.json
- 09:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 09:53 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 09:51 moritzm: installing qemu security updates
- 09:48 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 09:48 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 09:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P89669 and previous config saved to /var/cache/conftool/dbconfig/20260303-094732-marostegui.json
- 09:47 elukey@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
- 09:47 elukey@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
- 09:45 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 09:45 elukey@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
- 09:44 elukey@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
- 09:44 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 09:44 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 09:44 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
- 09:43 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 09:40 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 09:38 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1176.eqiad.wmnet with reason: host reimage
- 09:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 09:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T418465)', diff saved to https://phabricator.wikimedia.org/P89668 and previous config saved to /var/cache/conftool/dbconfig/20260303-093542-marostegui.json
- 09:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T418465)', diff saved to https://phabricator.wikimedia.org/P89667 and previous config saved to /var/cache/conftool/dbconfig/20260303-093224-marostegui.json
- 09:32 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 09:23 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp - 3.0 upgrade ()
- 09:23 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db1176.eqiad.wmnet with OS trixie
- 09:21 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 09:20 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 09:20 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp - 3.0 upgrade ()
- 09:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P89666 and previous config saved to /var/cache/conftool/dbconfig/20260303-092034-marostegui.json
- 09:19 arnaudb@dns1004: END - running authdns-update
- 09:18 arnaudb@dns1004: START - running authdns-update
- 09:17 moritzm: installing libbpf updates from Bookworm point release
- 09:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1221 (T418465)', diff saved to https://phabricator.wikimedia.org/P89665 and previous config saved to /var/cache/conftool/dbconfig/20260303-090818-marostegui.json
- 09:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on 6 hosts with reason: Maintenance
- 09:07 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 09:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T418465)', diff saved to https://phabricator.wikimedia.org/P89664 and previous config saved to /var/cache/conftool/dbconfig/20260303-090731-marostegui.json
- 09:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P89663 and previous config saved to /var/cache/conftool/dbconfig/20260303-090526-marostegui.json
- 08:54 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in codfw/ml-serve-codfw: maintenance
- 08:53 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster pool all services in codfw/ml-serve-codfw: maintenance
- 08:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P89662 and previous config saved to /var/cache/conftool/dbconfig/20260303-085224-marostegui.json
- 08:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T418465)', diff saved to https://phabricator.wikimedia.org/P89661 and previous config saved to /var/cache/conftool/dbconfig/20260303-085019-marostegui.json
- 08:47 moritzm: powercycling lvs1013
- 08:41 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo and A:cp - 3.0 upgrade ()
- 08:41 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo and A:cp - 3.0 upgrade ()
- 08:37 fabfur: start upgrading haproxy to 3.0 on A:cp-ulsfo (T417253)
- 08:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P89660 and previous config saved to /var/cache/conftool/dbconfig/20260303-083716-marostegui.json
- 08:32 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 08:32 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 08:31 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 08:30 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 08:28 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) depool all services in codfw/ml-serve-codfw: maintenance
- 08:27 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster depool all services in codfw/ml-serve-codfw: maintenance
- 08:24 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2172 (T418465)', diff saved to https://phabricator.wikimedia.org/P89659 and previous config saved to /var/cache/conftool/dbconfig/20260303-082424-marostegui.json
- 08:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 08:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T418465)', diff saved to https://phabricator.wikimedia.org/P89658 and previous config saved to /var/cache/conftool/dbconfig/20260303-082400-marostegui.json
- 08:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T418465)', diff saved to https://phabricator.wikimedia.org/P89657 and previous config saved to /var/cache/conftool/dbconfig/20260303-082209-marostegui.json
- 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P89656 and previous config saved to /var/cache/conftool/dbconfig/20260303-080853-marostegui.json
- 08:07 moritzm: installing PAM security updates on Bookworm
- 07:55 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1199 (T418465)', diff saved to https://phabricator.wikimedia.org/P89655 and previous config saved to /var/cache/conftool/dbconfig/20260303-075526-marostegui.json
- 07:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 07:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T418465)', diff saved to https://phabricator.wikimedia.org/P89654 and previous config saved to /var/cache/conftool/dbconfig/20260303-075502-marostegui.json
- 07:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P89653 and previous config saved to /var/cache/conftool/dbconfig/20260303-075345-marostegui.json
- 07:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P89652 and previous config saved to /var/cache/conftool/dbconfig/20260303-073955-marostegui.json
- 07:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T418465)', diff saved to https://phabricator.wikimedia.org/P89651 and previous config saved to /var/cache/conftool/dbconfig/20260303-073838-marostegui.json
- 07:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P89650 and previous config saved to /var/cache/conftool/dbconfig/20260303-072447-marostegui.json
- 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1051.eqiad.wmnet
- 07:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1051.eqiad.wmnet
- 07:10 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2155 (T418465)', diff saved to https://phabricator.wikimedia.org/P89649 and previous config saved to /var/cache/conftool/dbconfig/20260303-071054-marostegui.json
- 07:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 07:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T418465)', diff saved to https://phabricator.wikimedia.org/P89648 and previous config saved to /var/cache/conftool/dbconfig/20260303-071029-marostegui.json
- 07:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T418465)', diff saved to https://phabricator.wikimedia.org/P89647 and previous config saved to /var/cache/conftool/dbconfig/20260303-070940-marostegui.json
- 06:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P89646 and previous config saved to /var/cache/conftool/dbconfig/20260303-065523-marostegui.json
- 06:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1190 (T418465)', diff saved to https://phabricator.wikimedia.org/P89645 and previous config saved to /var/cache/conftool/dbconfig/20260303-064405-marostegui.json
- 06:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 06:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P89644 and previous config saved to /var/cache/conftool/dbconfig/20260303-064015-marostegui.json
- 06:33 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2240 gradually with 4 steps - repool after schema change
- 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T418465)', diff saved to https://phabricator.wikimedia.org/P89642 and previous config saved to /var/cache/conftool/dbconfig/20260303-062507-marostegui.json
- 05:58 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2147 (T418465)', diff saved to https://phabricator.wikimedia.org/P89639 and previous config saved to /var/cache/conftool/dbconfig/20260303-055834-marostegui.json
- 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 05:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 05:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 05:48 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2240 gradually with 4 steps - repool after schema change
- 05:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.46.0-wmf.15 (duration: 01m 10s)
- 04:43 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.46.0-wmf.18 refs T413809 (duration: 39m 43s)
- 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.46.0-wmf.18 refs T413809
- 03:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 03:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T418465)', diff saved to https://phabricator.wikimedia.org/P89637 and previous config saved to /var/cache/conftool/dbconfig/20260303-035746-marostegui.json
- 03:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P89636 and previous config saved to /var/cache/conftool/dbconfig/20260303-034239-marostegui.json
- 03:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P89635 and previous config saved to /var/cache/conftool/dbconfig/20260303-032731-marostegui.json
- 03:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T418465)', diff saved to https://phabricator.wikimedia.org/P89634 and previous config saved to /var/cache/conftool/dbconfig/20260303-031224-marostegui.json
- 03:02 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1251 (T418465)', diff saved to https://phabricator.wikimedia.org/P89633 and previous config saved to /var/cache/conftool/dbconfig/20260303-030217-marostegui.json
- 03:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1251.eqiad.wmnet with reason: Maintenance
- 02:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 02:08 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 08m 00s)
- 02:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 02:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T418465)', diff saved to https://phabricator.wikimedia.org/P89632 and previous config saved to /var/cache/conftool/dbconfig/20260303-020817-marostegui.json
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P89631 and previous config saved to /var/cache/conftool/dbconfig/20260303-015309-marostegui.json
- 01:42 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog2003.codfw.wmnet with OS trixie
- 01:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P89630 and previous config saved to /var/cache/conftool/dbconfig/20260303-013802-marostegui.json
- 01:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T418465)', diff saved to https://phabricator.wikimedia.org/P89629 and previous config saved to /var/cache/conftool/dbconfig/20260303-013719-marostegui.json
- 01:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T418465)', diff saved to https://phabricator.wikimedia.org/P89628 and previous config saved to /var/cache/conftool/dbconfig/20260303-012254-marostegui.json
- 01:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P89627 and previous config saved to /var/cache/conftool/dbconfig/20260303-012211-marostegui.json
- 01:19 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog2003.codfw.wmnet with reason: host reimage
- 01:11 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog2003.codfw.wmnet with reason: host reimage
- 01:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1235 (T418465)', diff saved to https://phabricator.wikimedia.org/P89626 and previous config saved to /var/cache/conftool/dbconfig/20260303-011151-marostegui.json
- 01:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 01:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T418465)', diff saved to https://phabricator.wikimedia.org/P89625 and previous config saved to /var/cache/conftool/dbconfig/20260303-011128-marostegui.json
- 01:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P89624 and previous config saved to /var/cache/conftool/dbconfig/20260303-010703-marostegui.json
- 00:59 zabe@deploy2002: Finished scap sync-world: Backport for Revert "ImageListPager: Properly support file schema migration read new" (duration: 08m 12s)
- 00:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P89623 and previous config saved to /var/cache/conftool/dbconfig/20260303-005620-marostegui.json
- 00:56 zabe@deploy2002: zabe: Continuing with sync
- 00:54 herron@cumin1003: START - Cookbook sre.hosts.reimage for host mwlog2003.codfw.wmnet with OS trixie
- 00:53 zabe@deploy2002: zabe: Backport for Revert "ImageListPager: Properly support file schema migration read new" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:53 herron@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mwlog2003.codfw.wmnet with OS trixie
- 00:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T418465)', diff saved to https://phabricator.wikimedia.org/P89622 and previous config saved to /var/cache/conftool/dbconfig/20260303-005156-marostegui.json
- 00:51 zabe@deploy2002: Started scap sync-world: Backport for Revert "ImageListPager: Properly support file schema migration read new"
- 00:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P89621 and previous config saved to /var/cache/conftool/dbconfig/20260303-004112-marostegui.json
- 00:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2216 (T418465)', diff saved to https://phabricator.wikimedia.org/P89620 and previous config saved to /var/cache/conftool/dbconfig/20260303-004056-marostegui.json
- 00:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 00:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T418465)', diff saved to https://phabricator.wikimedia.org/P89619 and previous config saved to /var/cache/conftool/dbconfig/20260303-004033-marostegui.json
- 00:31 herron@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog1003.eqiad.wmnet with OS trixie
- 00:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T418465)', diff saved to https://phabricator.wikimedia.org/P89618 and previous config saved to /var/cache/conftool/dbconfig/20260303-002604-marostegui.json
- 00:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P89617 and previous config saved to /var/cache/conftool/dbconfig/20260303-002525-marostegui.json
- 00:20 zabe@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 00:18 zabe@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 00:18 zabe@deploy2002: Finished scap sync-world: T418327 (duration: 05m 01s)
- 00:15 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1234 (T418465)', diff saved to https://phabricator.wikimedia.org/P89616 and previous config saved to /var/cache/conftool/dbconfig/20260303-001504-marostegui.json
- 00:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 00:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T418465)', diff saved to https://phabricator.wikimedia.org/P89615 and previous config saved to /var/cache/conftool/dbconfig/20260303-001440-marostegui.json
- 00:13 zabe@deploy2002: Started scap sync-world: T418327
- 00:11 zabe@deploy2002: zabe: Continuing with sync
- 00:10 zabe@deploy2002: zabe: Backport for ImageListPager: Properly support file schema migration read new (T418327) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P89614 and previous config saved to /var/cache/conftool/dbconfig/20260303-001018-marostegui.json
- 00:08 zabe@deploy2002: Started scap sync-world: Backport for ImageListPager: Properly support file schema migration read new (T418327)
2026-03-02
- 23:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P89613 and previous config saved to /var/cache/conftool/dbconfig/20260302-235933-marostegui.json
- 23:58 zabe@deploy2002: Finished scap sync-world: Backport for Stop writing to il_to on testwiki (T415787) (duration: 06m 02s)
- 23:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T418465)', diff saved to https://phabricator.wikimedia.org/P89612 and previous config saved to /var/cache/conftool/dbconfig/20260302-235511-marostegui.json
- 23:54 zabe@deploy2002: zabe: Continuing with sync
- 23:53 zabe@deploy2002: zabe: Backport for Stop writing to il_to on testwiki (T415787) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:52 zabe@deploy2002: Started scap sync-world: Backport for Stop writing to il_to on testwiki (T415787)
- 23:51 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp2058.codfw.wmnet with reason: dcops troubleshooting for T418527
- 23:50 zabe@deploy2002: Finished scap sync-world: Backport for multiversion: Stop setting MW_USE_CONFIG_SCHEMA (T304460) (duration: 07m 10s)
- 23:47 zabe@deploy2002: zabe: Continuing with sync
- 23:45 zabe@deploy2002: zabe: Backport for multiversion: Stop setting MW_USE_CONFIG_SCHEMA (T304460) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P89611 and previous config saved to /var/cache/conftool/dbconfig/20260302-234425-marostegui.json
- 23:44 herron@cumin1003: START - Cookbook sre.hosts.reimage for host mwlog2003.codfw.wmnet with OS trixie
- 23:43 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2203 (T418465)', diff saved to https://phabricator.wikimedia.org/P89610 and previous config saved to /var/cache/conftool/dbconfig/20260302-234350-marostegui.json
- 23:43 zabe@deploy2002: Started scap sync-world: Backport for multiversion: Stop setting MW_USE_CONFIG_SCHEMA (T304460)
- 23:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2203.codfw.wmnet with reason: Maintenance
- 23:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 23:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T418465)', diff saved to https://phabricator.wikimedia.org/P89609 and previous config saved to /var/cache/conftool/dbconfig/20260302-233517-marostegui.json
- 23:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T418465)', diff saved to https://phabricator.wikimedia.org/P89608 and previous config saved to /var/cache/conftool/dbconfig/20260302-232918-marostegui.json
- 23:25 dwisehaupt@dns1006: END - running authdns-update
- 23:24 dwisehaupt@dns1006: START - running authdns-update
- 23:23 herron@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog1003.eqiad.wmnet with reason: host reimage
- 23:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P89607 and previous config saved to /var/cache/conftool/dbconfig/20260302-232009-marostegui.json
- 23:18 herron@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog1003.eqiad.wmnet with reason: host reimage
- 23:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1232 (T418465)', diff saved to https://phabricator.wikimedia.org/P89606 and previous config saved to /var/cache/conftool/dbconfig/20260302-231723-marostegui.json
- 23:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 23:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T418465)', diff saved to https://phabricator.wikimedia.org/P89605 and previous config saved to /var/cache/conftool/dbconfig/20260302-231658-marostegui.json
- 23:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P89604 and previous config saved to /var/cache/conftool/dbconfig/20260302-230502-marostegui.json
- 23:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P89603 and previous config saved to /var/cache/conftool/dbconfig/20260302-230151-marostegui.json
- 22:57 herron@cumin1003: START - Cookbook sre.hosts.reimage for host mwlog1003.eqiad.wmnet with OS trixie
- 22:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T418465)', diff saved to https://phabricator.wikimedia.org/P89602 and previous config saved to /var/cache/conftool/dbconfig/20260302-224954-marostegui.json
- 22:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P89601 and previous config saved to /var/cache/conftool/dbconfig/20260302-224643-marostegui.json
- 22:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2188 (T418465)', diff saved to https://phabricator.wikimedia.org/P89600 and previous config saved to /var/cache/conftool/dbconfig/20260302-223612-marostegui.json
- 22:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 22:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T418465)', diff saved to https://phabricator.wikimedia.org/P89599 and previous config saved to /var/cache/conftool/dbconfig/20260302-223548-marostegui.json
- 22:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T418465)', diff saved to https://phabricator.wikimedia.org/P89598 and previous config saved to /var/cache/conftool/dbconfig/20260302-223135-marostegui.json
- 22:21 maryum: Deployed security fix for T418179
- 22:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P89597 and previous config saved to /var/cache/conftool/dbconfig/20260302-222041-marostegui.json
- 22:19 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1219 (T418465)', diff saved to https://phabricator.wikimedia.org/P89596 and previous config saved to /var/cache/conftool/dbconfig/20260302-221938-marostegui.json
- 22:19 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 22:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T418465)', diff saved to https://phabricator.wikimedia.org/P89595 and previous config saved to /var/cache/conftool/dbconfig/20260302-221925-marostegui.json
- 22:10 aaron@deploy2002: Finished scap sync-world: Backport for Add growthexperiments.v0 to $wgRestSandboxSpecs (T414470) (duration: 06m 39s)
- 22:06 aaron@deploy2002: aaron: Continuing with sync
- 22:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P89594 and previous config saved to /var/cache/conftool/dbconfig/20260302-220533-marostegui.json
- 22:05 aaron@deploy2002: aaron: Backport for Add growthexperiments.v0 to $wgRestSandboxSpecs (T414470) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P89593 and previous config saved to /var/cache/conftool/dbconfig/20260302-220418-marostegui.json
- 22:03 aaron@deploy2002: Started scap sync-world: Backport for Add growthexperiments.v0 to $wgRestSandboxSpecs (T414470)
- 22:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-backup2003.codfw.wmnet with OS trixie
- 22:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-backup2004.codfw.wmnet with OS trixie
- 22:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:01 catrope@deploy2002: Finished scap sync-world: Backport for ApiCSPReport: Use structured logging for CSP reports (duration: 08m 19s)
- 21:57 catrope@deploy2002: catrope: Continuing with sync
- 21:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:55 catrope@deploy2002: catrope: Backport for ApiCSPReport: Use structured logging for CSP reports synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:53 catrope@deploy2002: Started scap sync-world: Backport for ApiCSPReport: Use structured logging for CSP reports
- 21:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T418465)', diff saved to https://phabricator.wikimedia.org/P89592 and previous config saved to /var/cache/conftool/dbconfig/20260302-215025-marostegui.json
- 21:50 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2043.codfw.wmnet with reason: These are test instances, failing should not notif
- 21:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P89591 and previous config saved to /var/cache/conftool/dbconfig/20260302-214910-marostegui.json
- 21:48 inflatador: bking@desktop restarting wdqs codfw to clear ProbeDown alerts
- 21:43 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp2043.codfw.wmnet
- 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup2004.codfw.wmnet with reason: host reimage
- 21:39 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2176 (T418465)', diff saved to https://phabricator.wikimedia.org/P89590 and previous config saved to /var/cache/conftool/dbconfig/20260302-213957-marostegui.json
- 21:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 21:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T418465)', diff saved to https://phabricator.wikimedia.org/P89589 and previous config saved to /var/cache/conftool/dbconfig/20260302-213934-marostegui.json
- 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup2003.codfw.wmnet with reason: host reimage
- 21:36 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Testing removal of OpenJDK 8 support - eevans@cumin1003
- 21:34 catrope@deploy2002: Finished scap sync-world: Backport for Add Comments namespace for shnwikinews (T414403) (duration: 07m 07s)
- 21:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T418465)', diff saved to https://phabricator.wikimedia.org/P89588 and previous config saved to /var/cache/conftool/dbconfig/20260302-213402-marostegui.json
- 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup2004.codfw.wmnet with reason: host reimage
- 21:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup2003.codfw.wmnet with reason: host reimage
- 21:30 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2043.codfw.wmnet
- 21:30 catrope@deploy2002: shivaanshsingh, catrope: Continuing with sync
- 21:29 catrope@deploy2002: shivaanshsingh, catrope: Backport for Add Comments namespace for shnwikinews (T414403) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:27 catrope@deploy2002: Started scap sync-world: Backport for Add Comments namespace for shnwikinews (T414403)
- 21:24 kemayo@deploy2002: Finished scap sync-world: Backport for Suggestion Mode: add values for suggestion feedback properties (T401739), Stop PasteCheck A/B test (T417429) (duration: 10m 55s)
- 21:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P89587 and previous config saved to /var/cache/conftool/dbconfig/20260302-212426-marostegui.json
- 21:23 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1218 (T418465)', diff saved to https://phabricator.wikimedia.org/P89586 and previous config saved to /var/cache/conftool/dbconfig/20260302-212345-marostegui.json
- 21:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 21:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T418465)', diff saved to https://phabricator.wikimedia.org/P89585 and previous config saved to /var/cache/conftool/dbconfig/20260302-212321-marostegui.json
- 21:20 kemayo@deploy2002: esanders, kemayo, caro: Continuing with sync
- 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-backup2004.codfw.wmnet with OS trixie
- 21:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-backup2003.codfw.wmnet with OS trixie
- 21:16 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Testing removal of OpenJDK 8 support - eevans@cumin1003
- 21:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-backup2003']
- 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-backup2003']
- 21:15 kemayo@deploy2002: esanders, kemayo, caro: Backport for Suggestion Mode: add values for suggestion feedback properties (T401739), Stop PasteCheck A/B test (T417429) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:14 inflatador: bking@apt1002 reprepro --component thirdparty/opensearch3 update trixie-wikimedia T418388
- 21:13 kemayo@deploy2002: Started scap sync-world: Backport for Suggestion Mode: add values for suggestion feedback properties (T401739), Stop PasteCheck A/B test (T417429)
- 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-backup2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-backup2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:10 dani@deploy2002: Finished scap sync-world: Backport for Undeploy Comparative Reader Research survey on eswiki (T417834), Undeploy Comparative Reader Research survey on enwiki (T417829) (duration: 06m 52s)
- 21:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P89584 and previous config saved to /var/cache/conftool/dbconfig/20260302-210919-marostegui.json
- 21:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P89583 and previous config saved to /var/cache/conftool/dbconfig/20260302-210813-marostegui.json
- 21:06 dani@deploy2002: dani: Continuing with sync
- 21:05 dani@deploy2002: dani: Backport for Undeploy Comparative Reader Research survey on eswiki (T417834), Undeploy Comparative Reader Research survey on enwiki (T417829) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:03 dani@deploy2002: Started scap sync-world: Backport for Undeploy Comparative Reader Research survey on eswiki (T417834), Undeploy Comparative Reader Research survey on enwiki (T417829)
- 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-backup2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-backup2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-backup2004
- 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-backup2004
- 20:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-backup2003
- 20:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-backup2003
- 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-backup2003 to codfw - jhancock@cumin2002"
- 20:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-backup2003 to codfw - jhancock@cumin2002"
- 20:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T418465)', diff saved to https://phabricator.wikimedia.org/P89582 and previous config saved to /var/cache/conftool/dbconfig/20260302-205411-marostegui.json
- 20:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P89581 and previous config saved to /var/cache/conftool/dbconfig/20260302-205307-marostegui.json
- 20:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2174 (T418465)', diff saved to https://phabricator.wikimedia.org/P89580 and previous config saved to /var/cache/conftool/dbconfig/20260302-204136-marostegui.json
- 20:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 20:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T418465)', diff saved to https://phabricator.wikimedia.org/P89579 and previous config saved to /var/cache/conftool/dbconfig/20260302-204112-marostegui.json
- 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T418465)', diff saved to https://phabricator.wikimedia.org/P89578 and previous config saved to /var/cache/conftool/dbconfig/20260302-203759-marostegui.json
- 20:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1206 (T418465)', diff saved to https://phabricator.wikimedia.org/P89577 and previous config saved to /var/cache/conftool/dbconfig/20260302-202740-marostegui.json
- 20:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 20:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T418465)', diff saved to https://phabricator.wikimedia.org/P89576 and previous config saved to /var/cache/conftool/dbconfig/20260302-202716-marostegui.json
- 20:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P89575 and previous config saved to /var/cache/conftool/dbconfig/20260302-202604-marostegui.json
- 20:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P89574 and previous config saved to /var/cache/conftool/dbconfig/20260302-201209-marostegui.json
- 20:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P89573 and previous config saved to /var/cache/conftool/dbconfig/20260302-201057-marostegui.json
- 20:01 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2008-dev.codfw.wmnet with OS trixie
- 20:00 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd2008-dev.codfw.wmnet with OS trixie
- 19:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P89572 and previous config saved to /var/cache/conftool/dbconfig/20260302-195702-marostegui.json
- 19:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T418465)', diff saved to https://phabricator.wikimedia.org/P89571 and previous config saved to /var/cache/conftool/dbconfig/20260302-195549-marostegui.json
- 19:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2173 (T418465)', diff saved to https://phabricator.wikimedia.org/P89570 and previous config saved to /var/cache/conftool/dbconfig/20260302-194435-marostegui.json
- 19:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 19:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T418465)', diff saved to https://phabricator.wikimedia.org/P89569 and previous config saved to /var/cache/conftool/dbconfig/20260302-194411-marostegui.json
- 19:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T418465)', diff saved to https://phabricator.wikimedia.org/P89568 and previous config saved to /var/cache/conftool/dbconfig/20260302-194155-marostegui.json
- 19:31 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1196 (T418465)', diff saved to https://phabricator.wikimedia.org/P89566 and previous config saved to /var/cache/conftool/dbconfig/20260302-193119-marostegui.json
- 19:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 19:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 19:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T418465)', diff saved to https://phabricator.wikimedia.org/P89565 and previous config saved to /var/cache/conftool/dbconfig/20260302-193046-marostegui.json
- 19:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P89564 and previous config saved to /var/cache/conftool/dbconfig/20260302-192903-marostegui.json
- 19:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P89563 and previous config saved to /var/cache/conftool/dbconfig/20260302-191539-marostegui.json
- 19:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P89562 and previous config saved to /var/cache/conftool/dbconfig/20260302-191355-marostegui.json
- 19:12 dzahn@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 19:12 dzahn@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 19:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2095.codfw.wmnet with OS bullseye
- 19:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P89561 and previous config saved to /var/cache/conftool/dbconfig/20260302-190032-marostegui.json
- 18:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T418465)', diff saved to https://phabricator.wikimedia.org/P89560 and previous config saved to /var/cache/conftool/dbconfig/20260302-185848-marostegui.json
- 18:54 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2008-dev.codfw.wmnet with OS trixie
- 18:53 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd2008-dev.codfw.wmnet with OS trixie
- 18:48 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2170 (T418465)', diff saved to https://phabricator.wikimedia.org/P89559 and previous config saved to /var/cache/conftool/dbconfig/20260302-184832-marostegui.json
- 18:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 18:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T418465)', diff saved to https://phabricator.wikimedia.org/P89558 and previous config saved to /var/cache/conftool/dbconfig/20260302-184808-marostegui.json
- 18:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T418465)', diff saved to https://phabricator.wikimedia.org/P89557 and previous config saved to /var/cache/conftool/dbconfig/20260302-184524-marostegui.json
- 18:34 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1195 (T418465)', diff saved to https://phabricator.wikimedia.org/P89556 and previous config saved to /var/cache/conftool/dbconfig/20260302-183449-marostegui.json
- 18:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 18:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T418465)', diff saved to https://phabricator.wikimedia.org/P89555 and previous config saved to /var/cache/conftool/dbconfig/20260302-183425-marostegui.json
- 18:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P89554 and previous config saved to /var/cache/conftool/dbconfig/20260302-183300-marostegui.json
- 18:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P89553 and previous config saved to /var/cache/conftool/dbconfig/20260302-181918-marostegui.json
- 18:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P89552 and previous config saved to /var/cache/conftool/dbconfig/20260302-181753-marostegui.json
- 18:16 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2008-dev.codfw.wmnet with OS trixie
- 18:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P89551 and previous config saved to /var/cache/conftool/dbconfig/20260302-180411-marostegui.json
- 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T418465)', diff saved to https://phabricator.wikimedia.org/P89550 and previous config saved to /var/cache/conftool/dbconfig/20260302-180245-marostegui.json
- 18:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:53 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host contint2003
- 17:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host contint2003
- 17:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding contint2003 to codfw - jhancock@cumin2002"
- 17:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding contint2003 to codfw - jhancock@cumin2002"
- 17:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2153 (T418465)', diff saved to https://phabricator.wikimedia.org/P89549 and previous config saved to /var/cache/conftool/dbconfig/20260302-174917-marostegui.json
- 17:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 17:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 17:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T418465)', diff saved to https://phabricator.wikimedia.org/P89548 and previous config saved to /var/cache/conftool/dbconfig/20260302-174903-marostegui.json
- 17:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T418465)', diff saved to https://phabricator.wikimedia.org/P89547 and previous config saved to /var/cache/conftool/dbconfig/20260302-174854-marostegui.json
- 17:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host contint2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2095.codfw.wmnet with OS bullseye
- 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host contint2003
- 17:43 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host contint2003
- 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding contint2003 to codfw - jhancock@cumin2002"
- 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding contint2003 to codfw - jhancock@cumin2002"
- 17:39 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 17:38 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1186 (T418465)', diff saved to https://phabricator.wikimedia.org/P89546 and previous config saved to /var/cache/conftool/dbconfig/20260302-173827-marostegui.json
- 17:38 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 17:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T418465)', diff saved to https://phabricator.wikimedia.org/P89545 and previous config saved to /var/cache/conftool/dbconfig/20260302-173803-marostegui.json
- 17:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:36 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:34 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:33 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P89544 and previous config saved to /var/cache/conftool/dbconfig/20260302-173347-marostegui.json
- 17:32 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 17:32 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:24 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 17:23 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 17:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P89543 and previous config saved to /var/cache/conftool/dbconfig/20260302-172256-marostegui.json
- 17:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P89542 and previous config saved to /var/cache/conftool/dbconfig/20260302-171839-marostegui.json
- 17:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P89541 and previous config saved to /var/cache/conftool/dbconfig/20260302-170748-marostegui.json
- 17:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T418465)', diff saved to https://phabricator.wikimedia.org/P89540 and previous config saved to /var/cache/conftool/dbconfig/20260302-170331-marostegui.json
- 16:52 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2230.codfw.wmnet with OS trixie
- 16:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T418465)', diff saved to https://phabricator.wikimedia.org/P89539 and previous config saved to /var/cache/conftool/dbconfig/20260302-165240-marostegui.json
- 16:51 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2146 (T418465)', diff saved to https://phabricator.wikimedia.org/P89538 and previous config saved to /var/cache/conftool/dbconfig/20260302-165153-marostegui.json
- 16:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 16:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T418465)', diff saved to https://phabricator.wikimedia.org/P89537 and previous config saved to /var/cache/conftool/dbconfig/20260302-165129-marostegui.json
- 16:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1184 (T418465)', diff saved to https://phabricator.wikimedia.org/P89536 and previous config saved to /var/cache/conftool/dbconfig/20260302-164141-marostegui.json
- 16:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 16:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T418465)', diff saved to https://phabricator.wikimedia.org/P89535 and previous config saved to /var/cache/conftool/dbconfig/20260302-164118-marostegui.json
- 16:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P89534 and previous config saved to /var/cache/conftool/dbconfig/20260302-163622-marostegui.json
- 16:29 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
- 16:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P89533 and previous config saved to /var/cache/conftool/dbconfig/20260302-162610-marostegui.json
- 16:21 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
- 16:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P89532 and previous config saved to /var/cache/conftool/dbconfig/20260302-162115-marostegui.json
- 16:19 moritzm: installing PAM security updates on Bookworm
- 16:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P89531 and previous config saved to /var/cache/conftool/dbconfig/20260302-161102-marostegui.json
- 16:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T418465)', diff saved to https://phabricator.wikimedia.org/P89530 and previous config saved to /var/cache/conftool/dbconfig/20260302-160607-marostegui.json
- 16:05 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host db2230.codfw.wmnet with OS trixie
- 15:56 moritzm: installing glibc bugfix updates from trixie point release
- 15:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T418465)', diff saved to https://phabricator.wikimedia.org/P89529 and previous config saved to /var/cache/conftool/dbconfig/20260302-155555-marostegui.json
- 15:55 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2145 (T418465)', diff saved to https://phabricator.wikimedia.org/P89528 and previous config saved to /var/cache/conftool/dbconfig/20260302-155527-marostegui.json
- 15:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 15:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 15:45 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1169.eqiad.wmnet
- 15:45 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1169 (T418465)', diff saved to https://phabricator.wikimedia.org/P89527 and previous config saved to /var/cache/conftool/dbconfig/20260302-154520-marostegui.json
- 15:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 15:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 15:34 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1169.eqiad.wmnet
- 15:33 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1167.eqiad.wmnet
- 15:32 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 15:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2165.codfw.wmnet with reason: Maintenance
- 15:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1209.eqiad.wmnet with reason: Maintenance
- 15:31 marostegui@cumin1003: dbctl commit (dc=all): 'Restore db1226 full weight after schema change', diff saved to https://phabricator.wikimedia.org/P89526 and previous config saved to /var/cache/conftool/dbconfig/20260302-153100-marostegui.json
- 15:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P89525 and previous config saved to /var/cache/conftool/dbconfig/20260302-152334-marostegui.json
- 15:22 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1167.eqiad.wmnet
- 15:22 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1166.eqiad.wmnet
- 15:18 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 15:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T418465)', diff saved to https://phabricator.wikimedia.org/P89524 and previous config saved to /var/cache/conftool/dbconfig/20260302-151838-marostegui.json
- 15:10 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1166.eqiad.wmnet
- 15:10 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1165.eqiad.wmnet
- 15:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P89523 and previous config saved to /var/cache/conftool/dbconfig/20260302-150826-marostegui.json
- 15:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P89522 and previous config saved to /var/cache/conftool/dbconfig/20260302-150330-marostegui.json
- 15:00 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1097.eqiad.wmnet with OS bullseye
- 15:00 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:58 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1165.eqiad.wmnet
- 14:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1164.eqiad.wmnet
- 14:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T418465)', diff saved to https://phabricator.wikimedia.org/P89520 and previous config saved to /var/cache/conftool/dbconfig/20260302-145318-marostegui.json
- 14:48 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1164.eqiad.wmnet
- 14:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1163.eqiad.wmnet
- 14:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P89519 and previous config saved to /var/cache/conftool/dbconfig/20260302-144823-marostegui.json
- 14:41 Lucas_WMDE: UTC afternoon backport+config window done
- 14:40 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for IPInfo: Set log level to "info" (T374718) (duration: 08m 01s)
- 14:37 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1163.eqiad.wmnet
- 14:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1162.eqiad.wmnet
- 14:36 lucaswerkmeister-wmde@deploy2002: kharlan, lucaswerkmeister-wmde: Continuing with sync
- 14:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1226 (T418465)', diff saved to https://phabricator.wikimedia.org/P89517 and previous config saved to /var/cache/conftool/dbconfig/20260302-143608-marostegui.json
- 14:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 14:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T418465)', diff saved to https://phabricator.wikimedia.org/P89516 and previous config saved to /var/cache/conftool/dbconfig/20260302-143544-marostegui.json
- 14:34 lucaswerkmeister-wmde@deploy2002: kharlan, lucaswerkmeister-wmde: Backport for IPInfo: Set log level to "info" (T374718) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T418465)', diff saved to https://phabricator.wikimedia.org/P89515 and previous config saved to /var/cache/conftool/dbconfig/20260302-143315-marostegui.json
- 14:32 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for IPInfo: Set log level to "info" (T374718)
- 14:31 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 14:30 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Add configurations for graphql usage survey and its pipeline tests (T414476) (duration: 09m 44s)
- 14:27 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:26 lucaswerkmeister-wmde@deploy2002: itamar, lucaswerkmeister-wmde: Continuing with sync
- 14:26 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
- 14:25 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1162.eqiad.wmnet
- 14:25 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1161.eqiad.wmnet
- 14:23 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 14:22 lucaswerkmeister-wmde@deploy2002: itamar, lucaswerkmeister-wmde: Backport for Add configurations for graphql usage survey and its pipeline tests (T414476) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:20 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Add configurations for graphql usage survey and its pipeline tests (T414476)
- 14:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P89514 and previous config saved to /var/cache/conftool/dbconfig/20260302-142037-marostegui.json
- 14:19 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:18 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: namespaceDupes lawiki --fix # T418706
- 14:18 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2195 (T418465)', diff saved to https://phabricator.wikimedia.org/P89513 and previous config saved to /var/cache/conftool/dbconfig/20260302-141834-marostegui.json
- 14:18 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=ms-fe1013.eqiad.wmnet
- 14:18 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2195.codfw.wmnet with reason: Maintenance
- 14:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2004.codfw.wmnet
- 14:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T418465)', diff saved to https://phabricator.wikimedia.org/P89512 and previous config saved to /var/cache/conftool/dbconfig/20260302-141810-marostegui.json
- 14:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for lawiki: add Adumbratio (draft) namespace (T418706) (duration: 07m 27s)
- 14:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2004.codfw.wmnet
- 14:13 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, anzx: Continuing with sync
- 14:13 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
- 14:13 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1161.eqiad.wmnet
- 14:13 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1160.eqiad.wmnet
- 14:13 moritzm: installing libcap2 updates from Trixie point release
- 14:12 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, anzx: Backport for lawiki: add Adumbratio (draft) namespace (T418706) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:11 elukey@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:11 elukey@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 14:10 elukey@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:10 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for lawiki: add Adumbratio (draft) namespace (T418706)
- 14:10 elukey@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 14:08 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1028.eqiad.wmnet
- 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org
- 14:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P89511 and previous config saved to /var/cache/conftool/dbconfig/20260302-140529-marostegui.json
- 14:04 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1097.eqiad.wmnet with reason: host reimage
- 14:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P89510 and previous config saved to /var/cache/conftool/dbconfig/20260302-140302-marostegui.json
- 14:02 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1028.eqiad.wmnet
- 14:01 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1160.eqiad.wmnet
- 14:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org
- 14:01 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1159.eqiad.wmnet
- 14:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1025.eqiad.wmnet
- 13:57 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1097.eqiad.wmnet with reason: host reimage
- 13:54 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1025.eqiad.wmnet
- 13:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T418465)', diff saved to https://phabricator.wikimedia.org/P89509 and previous config saved to /var/cache/conftool/dbconfig/20260302-135021-marostegui.json
- 13:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P89508 and previous config saved to /var/cache/conftool/dbconfig/20260302-134754-marostegui.json
- 13:47 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1159.eqiad.wmnet
- 13:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1158.eqiad.wmnet
- 13:40 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1097.eqiad.wmnet with OS bullseye
- 13:38 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1097
- 13:38 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1097
- 13:38 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:38 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt ms-be1097 - jclark@cumin1003"
- 13:37 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update network and mgmt ms-be1097 - jclark@cumin1003"
- 13:35 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1158.eqiad.wmnet
- 13:35 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1157.eqiad.wmnet
- 13:35 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 13:35 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1214 (T418465)', diff saved to https://phabricator.wikimedia.org/P89507 and previous config saved to /var/cache/conftool/dbconfig/20260302-133503-marostegui.json
- 13:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 13:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T418465)', diff saved to https://phabricator.wikimedia.org/P89506 and previous config saved to /var/cache/conftool/dbconfig/20260302-133440-marostegui.json
- 13:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T418465)', diff saved to https://phabricator.wikimedia.org/P89505 and previous config saved to /var/cache/conftool/dbconfig/20260302-133247-marostegui.json
- 13:28 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
- 13:27 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 13:27 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1097
- 13:26 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1097
- 13:24 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1157.eqiad.wmnet
- 13:24 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1156.eqiad.wmnet
- 13:22 brouberol: Running `echo 'https://turnilo-next.wikimedia.org' | mwscript-k8s --attach -- purgeList.php`
- 13:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P89504 and previous config saved to /var/cache/conftool/dbconfig/20260302-131932-marostegui.json
- 13:16 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2181 (T418465)', diff saved to https://phabricator.wikimedia.org/P89503 and previous config saved to /var/cache/conftool/dbconfig/20260302-131653-marostegui.json
- 13:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2181.codfw.wmnet with reason: Maintenance
- 13:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T418465)', diff saved to https://phabricator.wikimedia.org/P89502 and previous config saved to /var/cache/conftool/dbconfig/20260302-131630-marostegui.json
- 13:15 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1024.eqiad.wmnet
- 13:14 moritzm: installing libcap2 updates from Bookworm point release
- 13:12 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1156.eqiad.wmnet
- 13:12 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1155.eqiad.wmnet
- 13:08 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1024.eqiad.wmnet
- 13:07 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 13:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P89500 and previous config saved to /var/cache/conftool/dbconfig/20260302-130424-marostegui.json
- 13:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P89499 and previous config saved to /var/cache/conftool/dbconfig/20260302-130122-marostegui.json
- 13:00 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
- 12:58 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2356.codfw.wmnet
- 12:58 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2356.codfw.wmnet
- 12:58 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1155.eqiad.wmnet
- 12:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1154.eqiad.wmnet
- 12:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T418465)', diff saved to https://phabricator.wikimedia.org/P89498 and previous config saved to /var/cache/conftool/dbconfig/20260302-124917-marostegui.json
- 12:46 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1154.eqiad.wmnet
- 12:46 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1153.eqiad.wmnet
- 12:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P89497 and previous config saved to /var/cache/conftool/dbconfig/20260302-124615-marostegui.json
- 12:35 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1153.eqiad.wmnet
- 12:35 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1152.eqiad.wmnet
- 12:32 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1203 (T418465)', diff saved to https://phabricator.wikimedia.org/P89494 and previous config saved to /var/cache/conftool/dbconfig/20260302-123253-marostegui.json
- 12:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 12:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T418465)', diff saved to https://phabricator.wikimedia.org/P89493 and previous config saved to /var/cache/conftool/dbconfig/20260302-123229-marostegui.json
- 12:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T418465)', diff saved to https://phabricator.wikimedia.org/P89492 and previous config saved to /var/cache/conftool/dbconfig/20260302-123108-marostegui.json
- 12:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 12:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 12:24 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1152.eqiad.wmnet
- 12:24 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1151.eqiad.wmnet
- 12:23 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 12:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/turnilo: apply
- 12:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/turnilo: apply
- 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P89491 and previous config saved to /var/cache/conftool/dbconfig/20260302-121722-marostegui.json
- 12:15 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2167 (T418465)', diff saved to https://phabricator.wikimedia.org/P89490 and previous config saved to /var/cache/conftool/dbconfig/20260302-121525-marostegui.json
- 12:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 12:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T418465)', diff saved to https://phabricator.wikimedia.org/P89489 and previous config saved to /var/cache/conftool/dbconfig/20260302-121501-marostegui.json
- 12:12 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1151.eqiad.wmnet
- 12:12 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1150.eqiad.wmnet
- 12:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P89488 and previous config saved to /var/cache/conftool/dbconfig/20260302-120214-marostegui.json
- 12:00 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1150.eqiad.wmnet
- 11:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P89487 and previous config saved to /var/cache/conftool/dbconfig/20260302-115953-marostegui.json
- 11:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T418465)', diff saved to https://phabricator.wikimedia.org/P89486 and previous config saved to /var/cache/conftool/dbconfig/20260302-114706-marostegui.json
- 11:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P89485 and previous config saved to /var/cache/conftool/dbconfig/20260302-114446-marostegui.json
- 11:30 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1193 (T418465)', diff saved to https://phabricator.wikimedia.org/P89484 and previous config saved to /var/cache/conftool/dbconfig/20260302-113034-marostegui.json
- 11:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-jumbo1001.eqiad.wmnet
- 11:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1193.eqiad.wmnet with reason: Maintenance
- 11:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T418465)', diff saved to https://phabricator.wikimedia.org/P89483 and previous config saved to /var/cache/conftool/dbconfig/20260302-113010-marostegui.json
- 11:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T418465)', diff saved to https://phabricator.wikimedia.org/P89482 and previous config saved to /var/cache/conftool/dbconfig/20260302-112937-marostegui.json
- 11:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-jumbo1001.eqiad.wmnet
- 11:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P89481 and previous config saved to /var/cache/conftool/dbconfig/20260302-111502-marostegui.json
- 11:13 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2166 (T418465)', diff saved to https://phabricator.wikimedia.org/P89480 and previous config saved to /var/cache/conftool/dbconfig/20260302-111351-marostegui.json
- 11:13 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 11:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T418465)', diff saved to https://phabricator.wikimedia.org/P89479 and previous config saved to /var/cache/conftool/dbconfig/20260302-111327-marostegui.json
- 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2003.codfw.wmnet
- 10:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P89478 and previous config saved to /var/cache/conftool/dbconfig/20260302-105955-marostegui.json
- 10:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P89477 and previous config saved to /var/cache/conftool/dbconfig/20260302-105818-marostegui.json
- 10:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin2003.codfw.wmnet
- 10:55 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 10:54 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 10:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 10:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 10:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 10:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 10:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 10:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 10:46 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru and A:cp - 3.0 upgrade ()
- 10:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T418465)', diff saved to https://phabricator.wikimedia.org/P89476 and previous config saved to /var/cache/conftool/dbconfig/20260302-104446-marostegui.json
- 10:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P89475 and previous config saved to /var/cache/conftool/dbconfig/20260302-104310-marostegui.json
- 10:28 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1192 (T418465)', diff saved to https://phabricator.wikimedia.org/P89474 and previous config saved to /var/cache/conftool/dbconfig/20260302-102825-marostegui.json
- 10:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 10:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T418465)', diff saved to https://phabricator.wikimedia.org/P89473 and previous config saved to /var/cache/conftool/dbconfig/20260302-102800-marostegui.json
- 10:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P89472 and previous config saved to /var/cache/conftool/dbconfig/20260302-101252-marostegui.json
- 10:12 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2164 (T418465)', diff saved to https://phabricator.wikimedia.org/P89471 and previous config saved to /var/cache/conftool/dbconfig/20260302-101200-marostegui.json
- 10:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 10:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T418465)', diff saved to https://phabricator.wikimedia.org/P89470 and previous config saved to /var/cache/conftool/dbconfig/20260302-101135-marostegui.json
- 10:08 moritzm: installing intel-microcode bugfix updates on Bookworm hosts
- 09:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P89469 and previous config saved to /var/cache/conftool/dbconfig/20260302-095744-marostegui.json
- 09:57 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru and A:cp - 3.0 upgrade ()
- 09:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P89468 and previous config saved to /var/cache/conftool/dbconfig/20260302-095627-marostegui.json
- 09:55 fabfur: start upgrading haproxy to 3.0 on A:cp-text_magru (T417253)
- 09:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T418465)', diff saved to https://phabricator.wikimedia.org/P89467 and previous config saved to /var/cache/conftool/dbconfig/20260302-094236-marostegui.json
- 09:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P89466 and previous config saved to /var/cache/conftool/dbconfig/20260302-094118-marostegui.json
- 09:35 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:35 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:34 moritzm: installing gnu TLS security updates
- 09:34 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:33 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T418465)', diff saved to https://phabricator.wikimedia.org/P89465 and previous config saved to /var/cache/conftool/dbconfig/20260302-092610-marostegui.json
- 09:26 mlitn@deploy2002: Finished scap sync-world: Backport for Limit additional whitespace to sticky header version only (T416598) (duration: 11m 02s)
- 09:26 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1178 (T418465)', diff saved to https://phabricator.wikimedia.org/P89464 and previous config saved to /var/cache/conftool/dbconfig/20260302-092600-marostegui.json
- 09:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 09:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T418465)', diff saved to https://phabricator.wikimedia.org/P89463 and previous config saved to /var/cache/conftool/dbconfig/20260302-092535-marostegui.json
- 09:21 mlitn@deploy2002: mlitn: Continuing with sync
- 09:16 mlitn@deploy2002: mlitn: Backport for Limit additional whitespace to sticky header version only (T416598) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:15 mlitn@deploy2002: Started scap sync-world: Backport for Limit additional whitespace to sticky header version only (T416598)
- 09:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P89462 and previous config saved to /var/cache/conftool/dbconfig/20260302-091027-marostegui.json
- 09:10 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2163 (T418465)', diff saved to https://phabricator.wikimedia.org/P89461 and previous config saved to /var/cache/conftool/dbconfig/20260302-091003-marostegui.json
- 09:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 09:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T418465)', diff saved to https://phabricator.wikimedia.org/P89460 and previous config saved to /var/cache/conftool/dbconfig/20260302-090938-marostegui.json
- 09:08 kharlan@deploy2002: Finished scap sync-world: Backport for HCaptchaEnterpriseHealthChecker: Add configurable retry count and delay (T418477) (duration: 16m 09s)
- 09:02 kharlan@deploy2002: kharlan: Continuing with sync
- 08:57 kharlan@deploy2002: kharlan: Backport for HCaptchaEnterpriseHealthChecker: Add configurable retry count and delay (T418477) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P89459 and previous config saved to /var/cache/conftool/dbconfig/20260302-085519-marostegui.json
- 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P89458 and previous config saved to /var/cache/conftool/dbconfig/20260302-085430-marostegui.json
- 08:51 kharlan@deploy2002: Started scap sync-world: Backport for HCaptchaEnterpriseHealthChecker: Add configurable retry count and delay (T418477)
- 08:48 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru and A:cp - 3.0 upgrade ()
- 08:47 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 08:45 moritzm: installing libxml2 security updates
- 08:44 kgraessle@deploy2002: Finished scap sync-world: Backport for Enable revert risk filters for first batch of wikis: < 1000 monthly edits (T411485) (duration: 37m 12s)
- 08:42 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dbproxy1029.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 08:41 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 08:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T418465)', diff saved to https://phabricator.wikimedia.org/P89457 and previous config saved to /var/cache/conftool/dbconfig/20260302-084010-marostegui.json
- 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P89456 and previous config saved to /var/cache/conftool/dbconfig/20260302-083922-marostegui.json
- 08:31 kgraessle@deploy2002: kgraessle: Continuing with sync
- 08:30 kgraessle@deploy2002: kgraessle: Backport for Enable revert risk filters for first batch of wikis: < 1000 monthly edits (T411485) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:30 elukey@cumin1003: START - Cookbook sre.hosts.provision for host dbproxy1028.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 08:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T418465)', diff saved to https://phabricator.wikimedia.org/P89455 and previous config saved to /var/cache/conftool/dbconfig/20260302-082414-marostegui.json
- 08:23 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1177 (T418465)', diff saved to https://phabricator.wikimedia.org/P89454 and previous config saved to /var/cache/conftool/dbconfig/20260302-082333-marostegui.json
- 08:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 08:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T418465)', diff saved to https://phabricator.wikimedia.org/P89453 and previous config saved to /var/cache/conftool/dbconfig/20260302-082309-marostegui.json
- 08:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbproxy1028.eqiad.wmnet with reason: Maintenance
- 08:19 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbproxy1029.eqiad.wmnet with reason: Maintenance
- 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2161 (T418465)', diff saved to https://phabricator.wikimedia.org/P89452 and previous config saved to /var/cache/conftool/dbconfig/20260302-080813-marostegui.json
- 08:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P89451 and previous config saved to /var/cache/conftool/dbconfig/20260302-080800-marostegui.json
- 08:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T418465)', diff saved to https://phabricator.wikimedia.org/P89450 and previous config saved to /var/cache/conftool/dbconfig/20260302-080748-marostegui.json
- 08:07 kgraessle@deploy2002: Started scap sync-world: Backport for Enable revert risk filters for first batch of wikis: < 1000 monthly edits (T411485)
- 08:05 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru and A:cp - 3.0 upgrade ()
- 08:05 fabfur: start upgrading haproxy to 3.0 on A:cp-upload_magru (T417253)
- 07:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P89449 and previous config saved to /var/cache/conftool/dbconfig/20260302-075252-marostegui.json
- 07:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P89448 and previous config saved to /var/cache/conftool/dbconfig/20260302-075241-marostegui.json
- 07:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T418465)', diff saved to https://phabricator.wikimedia.org/P89447 and previous config saved to /var/cache/conftool/dbconfig/20260302-073745-marostegui.json
- 07:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P89446 and previous config saved to /var/cache/conftool/dbconfig/20260302-073732-marostegui.json
- 07:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T418465)', diff saved to https://phabricator.wikimedia.org/P89445 and previous config saved to /var/cache/conftool/dbconfig/20260302-072224-marostegui.json
- 07:21 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1172 (T418465)', diff saved to https://phabricator.wikimedia.org/P89444 and previous config saved to /var/cache/conftool/dbconfig/20260302-072058-marostegui.json
- 07:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1172.eqiad.wmnet with reason: Maintenance
- 07:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 07:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T418465)', diff saved to https://phabricator.wikimedia.org/P89443 and previous config saved to /var/cache/conftool/dbconfig/20260302-070523-marostegui.json
- 07:05 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2154 (T418465)', diff saved to https://phabricator.wikimedia.org/P89442 and previous config saved to /var/cache/conftool/dbconfig/20260302-070512-marostegui.json
- 07:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 07:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T418465)', diff saved to https://phabricator.wikimedia.org/P89441 and previous config saved to /var/cache/conftool/dbconfig/20260302-070447-marostegui.json
- 07:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1244: After schema change
- 06:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P89439 and previous config saved to /var/cache/conftool/dbconfig/20260302-065014-marostegui.json
- 06:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P89438 and previous config saved to /var/cache/conftool/dbconfig/20260302-064938-marostegui.json
- 06:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P89436 and previous config saved to /var/cache/conftool/dbconfig/20260302-063506-marostegui.json
- 06:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P89435 and previous config saved to /var/cache/conftool/dbconfig/20260302-063430-marostegui.json
- 06:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T418465)', diff saved to https://phabricator.wikimedia.org/P89433 and previous config saved to /var/cache/conftool/dbconfig/20260302-061957-marostegui.json
- 06:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T418465)', diff saved to https://phabricator.wikimedia.org/P89432 and previous config saved to /var/cache/conftool/dbconfig/20260302-061922-marostegui.json
- 06:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2240.codfw.wmnet with reason: Maintenance
- 06:16 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1244: After schema change
- 06:15 marostegui@dns1004: END - running authdns-update
- 06:14 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2240 T418080', diff saved to https://phabricator.wikimedia.org/P89430 and previous config saved to /var/cache/conftool/dbconfig/20260302-061428-marostegui.json
- 06:13 marostegui@dns1004: START - running authdns-update
- 06:13 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2179 to s4 primary and set section read-write T418080', diff saved to https://phabricator.wikimedia.org/P89429 and previous config saved to /var/cache/conftool/dbconfig/20260302-061316-marostegui.json
- 06:12 marostegui@cumin1003: dbctl commit (dc=all): 'Set s4 codfw as read-only for maintenance - T418080', diff saved to https://phabricator.wikimedia.org/P89428 and previous config saved to /var/cache/conftool/dbconfig/20260302-061252-marostegui.json
- 06:06 marostegui: Starting s4 codfw failover from db2240 to db2179 - T418080
- 06:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 42 hosts with reason: Primary switchover s4 T418080
- 06:03 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2179 with weight 0 T418080', diff saved to https://phabricator.wikimedia.org/P89427 and previous config saved to /var/cache/conftool/dbconfig/20260302-060317-marostegui.json
- 06:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1167 (T418465)', diff saved to https://phabricator.wikimedia.org/P89426 and previous config saved to /var/cache/conftool/dbconfig/20260302-060317-marostegui.json
- 06:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 06:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 06:02 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2152 (T418465)', diff saved to https://phabricator.wikimedia.org/P89425 and previous config saved to /var/cache/conftool/dbconfig/20260302-060245-marostegui.json
- 06:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 02:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 13s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 00:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T418465)', diff saved to https://phabricator.wikimedia.org/P89424 and previous config saved to /var/cache/conftool/dbconfig/20260302-004950-marostegui.json
- 00:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P89423 and previous config saved to /var/cache/conftool/dbconfig/20260302-003441-marostegui.json
- 00:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P89422 and previous config saved to /var/cache/conftool/dbconfig/20260302-001933-marostegui.json
- 00:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T418465)', diff saved to https://phabricator.wikimedia.org/P89421 and previous config saved to /var/cache/conftool/dbconfig/20260302-000425-marostegui.json
- 00:02 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1253 (T418465)', diff saved to https://phabricator.wikimedia.org/P89420 and previous config saved to /var/cache/conftool/dbconfig/20260302-000208-marostegui.json
- 00:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1253.eqiad.wmnet with reason: Maintenance
- 00:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T418465)', diff saved to https://phabricator.wikimedia.org/P89419 and previous config saved to /var/cache/conftool/dbconfig/20260302-000143-marostegui.json
2026-03-01
- 23:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P89418 and previous config saved to /var/cache/conftool/dbconfig/20260301-234635-marostegui.json
- 23:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T418465)', diff saved to https://phabricator.wikimedia.org/P89417 and previous config saved to /var/cache/conftool/dbconfig/20260301-233524-marostegui.json
- 23:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P89416 and previous config saved to /var/cache/conftool/dbconfig/20260301-233127-marostegui.json
- 23:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P89415 and previous config saved to /var/cache/conftool/dbconfig/20260301-232016-marostegui.json
- 23:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T418465)', diff saved to https://phabricator.wikimedia.org/P89414 and previous config saved to /var/cache/conftool/dbconfig/20260301-231619-marostegui.json
- 23:14 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1236 (T418465)', diff saved to https://phabricator.wikimedia.org/P89413 and previous config saved to /var/cache/conftool/dbconfig/20260301-231404-marostegui.json
- 23:13 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 23:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T418465)', diff saved to https://phabricator.wikimedia.org/P89412 and previous config saved to /var/cache/conftool/dbconfig/20260301-231339-marostegui.json
- 23:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P89411 and previous config saved to /var/cache/conftool/dbconfig/20260301-230508-marostegui.json
- 22:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P89410 and previous config saved to /var/cache/conftool/dbconfig/20260301-225832-marostegui.json
- 22:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T418465)', diff saved to https://phabricator.wikimedia.org/P89409 and previous config saved to /var/cache/conftool/dbconfig/20260301-224959-marostegui.json
- 22:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2222 (T418465)', diff saved to https://phabricator.wikimedia.org/P89408 and previous config saved to /var/cache/conftool/dbconfig/20260301-224451-marostegui.json
- 22:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2222.codfw.wmnet with reason: Maintenance
- 22:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T418465)', diff saved to https://phabricator.wikimedia.org/P89407 and previous config saved to /var/cache/conftool/dbconfig/20260301-224426-marostegui.json
- 22:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P89406 and previous config saved to /var/cache/conftool/dbconfig/20260301-224324-marostegui.json
- 22:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P89405 and previous config saved to /var/cache/conftool/dbconfig/20260301-222919-marostegui.json
- 22:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T418465)', diff saved to https://phabricator.wikimedia.org/P89404 and previous config saved to /var/cache/conftool/dbconfig/20260301-222815-marostegui.json
- 22:26 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1231 (T418465)', diff saved to https://phabricator.wikimedia.org/P89403 and previous config saved to /var/cache/conftool/dbconfig/20260301-222600-marostegui.json
- 22:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 22:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T418465)', diff saved to https://phabricator.wikimedia.org/P89402 and previous config saved to /var/cache/conftool/dbconfig/20260301-222536-marostegui.json
- 22:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P89401 and previous config saved to /var/cache/conftool/dbconfig/20260301-221410-marostegui.json
- 22:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P89400 and previous config saved to /var/cache/conftool/dbconfig/20260301-221027-marostegui.json
- 21:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T418465)', diff saved to https://phabricator.wikimedia.org/P89399 and previous config saved to /var/cache/conftool/dbconfig/20260301-215902-marostegui.json
- 21:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P89398 and previous config saved to /var/cache/conftool/dbconfig/20260301-215519-marostegui.json
- 21:54 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2221 (T418465)', diff saved to https://phabricator.wikimedia.org/P89397 and previous config saved to /var/cache/conftool/dbconfig/20260301-215404-marostegui.json
- 21:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2221.codfw.wmnet with reason: Maintenance
- 21:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T418465)', diff saved to https://phabricator.wikimedia.org/P89396 and previous config saved to /var/cache/conftool/dbconfig/20260301-215339-marostegui.json
- 21:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T418465)', diff saved to https://phabricator.wikimedia.org/P89395 and previous config saved to /var/cache/conftool/dbconfig/20260301-214011-marostegui.json
- 21:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P89394 and previous config saved to /var/cache/conftool/dbconfig/20260301-213831-marostegui.json
- 21:34 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1227 (T418465)', diff saved to https://phabricator.wikimedia.org/P89393 and previous config saved to /var/cache/conftool/dbconfig/20260301-213410-marostegui.json
- 21:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 21:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T418465)', diff saved to https://phabricator.wikimedia.org/P89392 and previous config saved to /var/cache/conftool/dbconfig/20260301-213346-marostegui.json
- 21:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P89391 and previous config saved to /var/cache/conftool/dbconfig/20260301-212323-marostegui.json
- 21:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P89390 and previous config saved to /var/cache/conftool/dbconfig/20260301-211837-marostegui.json
- 21:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T418465)', diff saved to https://phabricator.wikimedia.org/P89389 and previous config saved to /var/cache/conftool/dbconfig/20260301-210815-marostegui.json
- 21:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P89388 and previous config saved to /var/cache/conftool/dbconfig/20260301-210329-marostegui.json
- 21:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2218 (T418465)', diff saved to https://phabricator.wikimedia.org/P89387 and previous config saved to /var/cache/conftool/dbconfig/20260301-210309-marostegui.json
- 21:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2218.codfw.wmnet with reason: Maintenance
- 21:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T418465)', diff saved to https://phabricator.wikimedia.org/P89386 and previous config saved to /var/cache/conftool/dbconfig/20260301-210244-marostegui.json
- 20:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T418465)', diff saved to https://phabricator.wikimedia.org/P89385 and previous config saved to /var/cache/conftool/dbconfig/20260301-204820-marostegui.json
- 20:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P89384 and previous config saved to /var/cache/conftool/dbconfig/20260301-204736-marostegui.json
- 20:46 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1202 (T418465)', diff saved to https://phabricator.wikimedia.org/P89383 and previous config saved to /var/cache/conftool/dbconfig/20260301-204606-marostegui.json
- 20:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 20:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T418465)', diff saved to https://phabricator.wikimedia.org/P89382 and previous config saved to /var/cache/conftool/dbconfig/20260301-204541-marostegui.json
- 20:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P89381 and previous config saved to /var/cache/conftool/dbconfig/20260301-203227-marostegui.json
- 20:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P89380 and previous config saved to /var/cache/conftool/dbconfig/20260301-203033-marostegui.json
- 20:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T418465)', diff saved to https://phabricator.wikimedia.org/P89379 and previous config saved to /var/cache/conftool/dbconfig/20260301-201720-marostegui.json
- 20:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P89378 and previous config saved to /var/cache/conftool/dbconfig/20260301-201525-marostegui.json
- 20:12 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2208 (T418465)', diff saved to https://phabricator.wikimedia.org/P89377 and previous config saved to /var/cache/conftool/dbconfig/20260301-201212-marostegui.json
- 20:12 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2208.codfw.wmnet with reason: Maintenance
- 20:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 20:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 20:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T418465)', diff saved to https://phabricator.wikimedia.org/P89376 and previous config saved to /var/cache/conftool/dbconfig/20260301-200422-marostegui.json
- 20:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T418465)', diff saved to https://phabricator.wikimedia.org/P89375 and previous config saved to /var/cache/conftool/dbconfig/20260301-200016-marostegui.json
- 19:58 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1194 (T418465)', diff saved to https://phabricator.wikimedia.org/P89374 and previous config saved to /var/cache/conftool/dbconfig/20260301-195803-marostegui.json
- 19:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 19:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T418465)', diff saved to https://phabricator.wikimedia.org/P89373 and previous config saved to /var/cache/conftool/dbconfig/20260301-195738-marostegui.json
- 19:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P89372 and previous config saved to /var/cache/conftool/dbconfig/20260301-194914-marostegui.json
- 19:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P89371 and previous config saved to /var/cache/conftool/dbconfig/20260301-194230-marostegui.json
- 19:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P89370 and previous config saved to /var/cache/conftool/dbconfig/20260301-193406-marostegui.json
- 19:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P89369 and previous config saved to /var/cache/conftool/dbconfig/20260301-192721-marostegui.json
- 19:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T418465)', diff saved to https://phabricator.wikimedia.org/P89368 and previous config saved to /var/cache/conftool/dbconfig/20260301-191858-marostegui.json
- 19:13 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2182 (T418465)', diff saved to https://phabricator.wikimedia.org/P89367 and previous config saved to /var/cache/conftool/dbconfig/20260301-191340-marostegui.json
- 19:13 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 19:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T418465)', diff saved to https://phabricator.wikimedia.org/P89366 and previous config saved to /var/cache/conftool/dbconfig/20260301-191315-marostegui.json
- 19:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T418465)', diff saved to https://phabricator.wikimedia.org/P89365 and previous config saved to /var/cache/conftool/dbconfig/20260301-191213-marostegui.json
- 19:10 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1191 (T418465)', diff saved to https://phabricator.wikimedia.org/P89364 and previous config saved to /var/cache/conftool/dbconfig/20260301-190958-marostegui.json
- 19:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 19:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T418465)', diff saved to https://phabricator.wikimedia.org/P89363 and previous config saved to /var/cache/conftool/dbconfig/20260301-190934-marostegui.json
- 18:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P89362 and previous config saved to /var/cache/conftool/dbconfig/20260301-185807-marostegui.json
- 18:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P89361 and previous config saved to /var/cache/conftool/dbconfig/20260301-185425-marostegui.json
- 18:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P89360 and previous config saved to /var/cache/conftool/dbconfig/20260301-184259-marostegui.json
- 18:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P89359 and previous config saved to /var/cache/conftool/dbconfig/20260301-183917-marostegui.json
- 18:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T418465)', diff saved to https://phabricator.wikimedia.org/P89358 and previous config saved to /var/cache/conftool/dbconfig/20260301-182750-marostegui.json
- 18:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T418465)', diff saved to https://phabricator.wikimedia.org/P89357 and previous config saved to /var/cache/conftool/dbconfig/20260301-182409-marostegui.json
- 18:22 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2168 (T418465)', diff saved to https://phabricator.wikimedia.org/P89356 and previous config saved to /var/cache/conftool/dbconfig/20260301-182238-marostegui.json
- 18:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 18:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T418465)', diff saved to https://phabricator.wikimedia.org/P89355 and previous config saved to /var/cache/conftool/dbconfig/20260301-182213-marostegui.json
- 18:21 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1174 (T418465)', diff saved to https://phabricator.wikimedia.org/P89354 and previous config saved to /var/cache/conftool/dbconfig/20260301-182153-marostegui.json
- 18:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 18:18 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 18:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T418465)', diff saved to https://phabricator.wikimedia.org/P89353 and previous config saved to /var/cache/conftool/dbconfig/20260301-181818-marostegui.json
- 18:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P89352 and previous config saved to /var/cache/conftool/dbconfig/20260301-180705-marostegui.json
- 18:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P89351 and previous config saved to /var/cache/conftool/dbconfig/20260301-180310-marostegui.json
- 17:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P89350 and previous config saved to /var/cache/conftool/dbconfig/20260301-175157-marostegui.json
- 17:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P89349 and previous config saved to /var/cache/conftool/dbconfig/20260301-174802-marostegui.json
- 17:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T418465)', diff saved to https://phabricator.wikimedia.org/P89348 and previous config saved to /var/cache/conftool/dbconfig/20260301-173649-marostegui.json
- 17:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T418465)', diff saved to https://phabricator.wikimedia.org/P89347 and previous config saved to /var/cache/conftool/dbconfig/20260301-173253-marostegui.json
- 17:31 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2159 (T418465)', diff saved to https://phabricator.wikimedia.org/P89346 and previous config saved to /var/cache/conftool/dbconfig/20260301-173134-marostegui.json
- 17:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 17:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T418465)', diff saved to https://phabricator.wikimedia.org/P89345 and previous config saved to /var/cache/conftool/dbconfig/20260301-173110-marostegui.json
- 17:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1170 (T418465)', diff saved to https://phabricator.wikimedia.org/P89344 and previous config saved to /var/cache/conftool/dbconfig/20260301-172742-marostegui.json
- 17:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 17:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T418465)', diff saved to https://phabricator.wikimedia.org/P89343 and previous config saved to /var/cache/conftool/dbconfig/20260301-172717-marostegui.json
- 17:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P89342 and previous config saved to /var/cache/conftool/dbconfig/20260301-171602-marostegui.json
- 17:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P89341 and previous config saved to /var/cache/conftool/dbconfig/20260301-171210-marostegui.json
- 17:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P89340 and previous config saved to /var/cache/conftool/dbconfig/20260301-170053-marostegui.json
- 16:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P89339 and previous config saved to /var/cache/conftool/dbconfig/20260301-165701-marostegui.json
- 16:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T418465)', diff saved to https://phabricator.wikimedia.org/P89338 and previous config saved to /var/cache/conftool/dbconfig/20260301-164545-marostegui.json
- 16:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T418465)', diff saved to https://phabricator.wikimedia.org/P89337 and previous config saved to /var/cache/conftool/dbconfig/20260301-164153-marostegui.json
- 16:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2150 (T418465)', diff saved to https://phabricator.wikimedia.org/P89336 and previous config saved to /var/cache/conftool/dbconfig/20260301-164022-marostegui.json
- 16:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 16:39 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1158 (T418465)', diff saved to https://phabricator.wikimedia.org/P89335 and previous config saved to /var/cache/conftool/dbconfig/20260301-163938-marostegui.json
- 16:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 16:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 16:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 16:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2213.codfw.wmnet with reason: Maintenance
- 12:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T418465)', diff saved to https://phabricator.wikimedia.org/P89334 and previous config saved to /var/cache/conftool/dbconfig/20260301-122201-marostegui.json
- 12:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P89333 and previous config saved to /var/cache/conftool/dbconfig/20260301-120652-marostegui.json
- 11:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P89332 and previous config saved to /var/cache/conftool/dbconfig/20260301-115144-marostegui.json
- 11:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T418465)', diff saved to https://phabricator.wikimedia.org/P89331 and previous config saved to /var/cache/conftool/dbconfig/20260301-113636-marostegui.json
- 11:31 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2228 (T418465)', diff saved to https://phabricator.wikimedia.org/P89330 and previous config saved to /var/cache/conftool/dbconfig/20260301-113156-marostegui.json
- 11:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2228.codfw.wmnet with reason: Maintenance
- 11:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T418465)', diff saved to https://phabricator.wikimedia.org/P89329 and previous config saved to /var/cache/conftool/dbconfig/20260301-113131-marostegui.json
- 11:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 11:19 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 11:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 11:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T418465)', diff saved to https://phabricator.wikimedia.org/P89328 and previous config saved to /var/cache/conftool/dbconfig/20260301-111658-marostegui.json
- 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P89327 and previous config saved to /var/cache/conftool/dbconfig/20260301-111622-marostegui.json
- 11:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P89326 and previous config saved to /var/cache/conftool/dbconfig/20260301-110151-marostegui.json
- 11:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P89325 and previous config saved to /var/cache/conftool/dbconfig/20260301-110114-marostegui.json
- 10:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P89324 and previous config saved to /var/cache/conftool/dbconfig/20260301-104642-marostegui.json
- 10:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T418465)', diff saved to https://phabricator.wikimedia.org/P89323 and previous config saved to /var/cache/conftool/dbconfig/20260301-104606-marostegui.json
- 10:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2223 (T418465)', diff saved to https://phabricator.wikimedia.org/P89322 and previous config saved to /var/cache/conftool/dbconfig/20260301-104024-marostegui.json
- 10:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2223.codfw.wmnet with reason: Maintenance
- 10:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T418465)', diff saved to https://phabricator.wikimedia.org/P89321 and previous config saved to /var/cache/conftool/dbconfig/20260301-103958-marostegui.json
- 10:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T418465)', diff saved to https://phabricator.wikimedia.org/P89320 and previous config saved to /var/cache/conftool/dbconfig/20260301-103134-marostegui.json
- 10:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1210 (T418465)', diff saved to https://phabricator.wikimedia.org/P89319 and previous config saved to /var/cache/conftool/dbconfig/20260301-102727-marostegui.json
- 10:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 10:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T418465)', diff saved to https://phabricator.wikimedia.org/P89318 and previous config saved to /var/cache/conftool/dbconfig/20260301-102702-marostegui.json
- 10:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P89317 and previous config saved to /var/cache/conftool/dbconfig/20260301-102450-marostegui.json
- 10:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P89316 and previous config saved to /var/cache/conftool/dbconfig/20260301-101154-marostegui.json
- 10:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P89315 and previous config saved to /var/cache/conftool/dbconfig/20260301-100942-marostegui.json
- 09:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P89314 and previous config saved to /var/cache/conftool/dbconfig/20260301-095645-marostegui.json
- 09:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T418465)', diff saved to https://phabricator.wikimedia.org/P89313 and previous config saved to /var/cache/conftool/dbconfig/20260301-095434-marostegui.json
- 09:48 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2211 (T418465)', diff saved to https://phabricator.wikimedia.org/P89312 and previous config saved to /var/cache/conftool/dbconfig/20260301-094847-marostegui.json
- 09:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2211.codfw.wmnet with reason: Maintenance
- 09:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2201.codfw.wmnet with reason: Maintenance
- 09:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T418465)', diff saved to https://phabricator.wikimedia.org/P89311 and previous config saved to /var/cache/conftool/dbconfig/20260301-094432-marostegui.json
- 09:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T418465)', diff saved to https://phabricator.wikimedia.org/P89310 and previous config saved to /var/cache/conftool/dbconfig/20260301-094137-marostegui.json
- 09:38 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1207 (T418465)', diff saved to https://phabricator.wikimedia.org/P89309 and previous config saved to /var/cache/conftool/dbconfig/20260301-093835-marostegui.json
- 09:38 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 09:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T418465)', diff saved to https://phabricator.wikimedia.org/P89308 and previous config saved to /var/cache/conftool/dbconfig/20260301-093810-marostegui.json
- 09:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P89307 and previous config saved to /var/cache/conftool/dbconfig/20260301-092923-marostegui.json
- 09:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P89306 and previous config saved to /var/cache/conftool/dbconfig/20260301-092302-marostegui.json
- 09:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P89305 and previous config saved to /var/cache/conftool/dbconfig/20260301-091415-marostegui.json
- 09:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P89304 and previous config saved to /var/cache/conftool/dbconfig/20260301-090754-marostegui.json
- 08:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T418465)', diff saved to https://phabricator.wikimedia.org/P89303 and previous config saved to /var/cache/conftool/dbconfig/20260301-085907-marostegui.json
- 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2192 (T418465)', diff saved to https://phabricator.wikimedia.org/P89302 and previous config saved to /var/cache/conftool/dbconfig/20260301-085427-marostegui.json
- 08:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T418465)', diff saved to https://phabricator.wikimedia.org/P89301 and previous config saved to /var/cache/conftool/dbconfig/20260301-085403-marostegui.json
- 08:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T418465)', diff saved to https://phabricator.wikimedia.org/P89300 and previous config saved to /var/cache/conftool/dbconfig/20260301-085246-marostegui.json
- 08:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1200 (T418465)', diff saved to https://phabricator.wikimedia.org/P89299 and previous config saved to /var/cache/conftool/dbconfig/20260301-084952-marostegui.json
- 08:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 08:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T418465)', diff saved to https://phabricator.wikimedia.org/P89298 and previous config saved to /var/cache/conftool/dbconfig/20260301-084928-marostegui.json
- 08:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P89297 and previous config saved to /var/cache/conftool/dbconfig/20260301-083855-marostegui.json
- 08:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P89296 and previous config saved to /var/cache/conftool/dbconfig/20260301-083420-marostegui.json
- 08:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P89295 and previous config saved to /var/cache/conftool/dbconfig/20260301-082346-marostegui.json
- 08:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P89294 and previous config saved to /var/cache/conftool/dbconfig/20260301-081912-marostegui.json
- 08:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T418465)', diff saved to https://phabricator.wikimedia.org/P89293 and previous config saved to /var/cache/conftool/dbconfig/20260301-080838-marostegui.json
- 08:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T418465)', diff saved to https://phabricator.wikimedia.org/P89292 and previous config saved to /var/cache/conftool/dbconfig/20260301-080404-marostegui.json
- 08:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 08:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T418465)', diff saved to https://phabricator.wikimedia.org/P89291 and previous config saved to /var/cache/conftool/dbconfig/20260301-080341-marostegui.json
- 08:01 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1185 (T418465)', diff saved to https://phabricator.wikimedia.org/P89290 and previous config saved to /var/cache/conftool/dbconfig/20260301-080110-marostegui.json
- 08:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 08:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T418465)', diff saved to https://phabricator.wikimedia.org/P89289 and previous config saved to /var/cache/conftool/dbconfig/20260301-080044-marostegui.json
- 07:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P89288 and previous config saved to /var/cache/conftool/dbconfig/20260301-074833-marostegui.json
- 07:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P89287 and previous config saved to /var/cache/conftool/dbconfig/20260301-074536-marostegui.json
- 07:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P89286 and previous config saved to /var/cache/conftool/dbconfig/20260301-073324-marostegui.json
- 07:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P89285 and previous config saved to /var/cache/conftool/dbconfig/20260301-073028-marostegui.json
- 07:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T418465)', diff saved to https://phabricator.wikimedia.org/P89284 and previous config saved to /var/cache/conftool/dbconfig/20260301-071816-marostegui.json
- 07:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T418465)', diff saved to https://phabricator.wikimedia.org/P89283 and previous config saved to /var/cache/conftool/dbconfig/20260301-071521-marostegui.json
- 07:12 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2171 (T418465)', diff saved to https://phabricator.wikimedia.org/P89282 and previous config saved to /var/cache/conftool/dbconfig/20260301-071226-marostegui.json
- 07:12 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 07:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T418465)', diff saved to https://phabricator.wikimedia.org/P89281 and previous config saved to /var/cache/conftool/dbconfig/20260301-071201-marostegui.json
- 07:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1161 (T418465)', diff saved to https://phabricator.wikimedia.org/P89280 and previous config saved to /var/cache/conftool/dbconfig/20260301-071113-marostegui.json
- 07:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 07:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 07:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T418465)', diff saved to https://phabricator.wikimedia.org/P89279 and previous config saved to /var/cache/conftool/dbconfig/20260301-071040-marostegui.json
- 06:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P89278 and previous config saved to /var/cache/conftool/dbconfig/20260301-065653-marostegui.json
- 06:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P89277 and previous config saved to /var/cache/conftool/dbconfig/20260301-065531-marostegui.json
- 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P89276 and previous config saved to /var/cache/conftool/dbconfig/20260301-064145-marostegui.json
- 06:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P89275 and previous config saved to /var/cache/conftool/dbconfig/20260301-064023-marostegui.json
- 06:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T418465)', diff saved to https://phabricator.wikimedia.org/P89274 and previous config saved to /var/cache/conftool/dbconfig/20260301-062636-marostegui.json
- 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T418465)', diff saved to https://phabricator.wikimedia.org/P89273 and previous config saved to /var/cache/conftool/dbconfig/20260301-062515-marostegui.json
- 06:21 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1159 (T418465)', diff saved to https://phabricator.wikimedia.org/P89272 and previous config saved to /var/cache/conftool/dbconfig/20260301-062108-marostegui.json
- 06:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1159.eqiad.wmnet with reason: Maintenance
- 06:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2157 (T418465)', diff saved to https://phabricator.wikimedia.org/P89271 and previous config saved to /var/cache/conftool/dbconfig/20260301-062047-marostegui.json
- 06:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 06:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2207.codfw.wmnet with reason: Maintenance
- 06:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 00s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image