Server Admin Log
Appearance
(Redirected from Server admin log)
2026-02-08
- 02:01 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 01s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-07
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 52s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-06
- 18:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul2001.codfw.wmnet with reason: WIP
- 18:08 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul1001.eqiad.wmnet with reason: WIP
- 17:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs7003*} and A:liberica
- 17:25 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs7003*} and A:liberica
- 16:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 16:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 14:57 hashar@deploy2002: Finished scap sync-world: Backport for TypeError: Unsupported operand types: array + null (T416619) (duration: 11m 23s)
- 14:53 hashar@deploy2002: hashar: Continuing with sync
- 14:50 hashar@deploy2002: hashar: Backport for TypeError: Unsupported operand types: array + null (T416619) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:46 hashar@deploy2002: Started scap sync-world: Backport for TypeError: Unsupported operand types: array + null (T416619)
- 14:42 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:42 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:39 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:39 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:39 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:38 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 13:19 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 13:19 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 13:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 13:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 12:58 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 12:58 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 12:52 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqord
- 12:52 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr2-eqord
- 12:50 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 12:49 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 12:47 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-e15b-eqiad
- 12:47 cmooney@cumin1003: START - Cookbook sre.network.tls for network device fasw2-e15b-eqiad
- 12:47 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-e15a-eqiad
- 12:47 cmooney@cumin1003: START - Cookbook sre.network.tls for network device fasw2-e15a-eqiad
- 12:47 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-e16b-eqiad
- 12:47 cmooney@cumin1003: START - Cookbook sre.network.tls for network device fasw2-e16b-eqiad
- 12:47 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-e16a-eqiad
- 12:46 cmooney@cumin1003: START - Cookbook sre.network.tls for network device fasw2-e16a-eqiad
- 12:04 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
- 11:11 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:11 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update entries for private1-d8-eqiad gateway IPs - cmooney@cumin1003"
- 11:11 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update entries for private1-d8-eqiad gateway IPs - cmooney@cumin1003"
- 11:05 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 10:56 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
- 10:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 10:26 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host aux-k8s-worker1006
- 10:26 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm
- 10:05 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1002.eqiad.wmnet with OS trixie
- 09:47 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 09:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 09:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki1002.eqiad.wmnet with OS trixie
- 06:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 04:19 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 04:13 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 00s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:29 rzl@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/sophroid: apply
- 00:29 rzl@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/sophroid: apply
- 00:28 rzl@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/sophroid: apply
- 00:28 rzl@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/sophroid: apply
2026-02-05
- 23:50 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 23:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 23:50 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 23:28 maryum: Deployed security fix for T410429
- 22:59 maryum: Deployed security fix for T416502
- 22:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 22:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 22:38 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 22:38 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 21:55 kemayo@deploy2002: Finished scap sync-world: Backport for EditCheck: Adjust copy of experimental checks, TextMatchEditCheck: Place 'dismiss' action last, TextMatch: allow links in descriptions (T416511) (duration: 08m 19s)
- 21:51 kemayo@deploy2002: kemayo: Continuing with sync
- 21:48 kemayo@deploy2002: kemayo: Backport for EditCheck: Adjust copy of experimental checks, TextMatchEditCheck: Place 'dismiss' action last, TextMatch: allow links in descriptions (T416511) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:46 kemayo@deploy2002: Started scap sync-world: Backport for EditCheck: Adjust copy of experimental checks, TextMatchEditCheck: Place 'dismiss' action last, TextMatch: allow links in descriptions (T416511)
- 21:31 jdrewniak@deploy2002: Finished scap sync-world: Backport for Enable Extension:WP25EasterEggs on testwiki. (duration: 07m 45s)
- 21:27 jdrewniak@deploy2002: jdrewniak: Continuing with sync
- 21:25 jdrewniak@deploy2002: jdrewniak: Backport for Enable Extension:WP25EasterEggs on testwiki. synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:24 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 21:23 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 21:23 jdrewniak@deploy2002: Started scap sync-world: Backport for Enable Extension:WP25EasterEggs on testwiki.
- 21:22 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 21:22 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 21:22 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 21:21 jdrewniak@deploy2002: Finished scap sync-world: Backport for Renaming `MetricsPlatform` => `TestKitchen` (T414435), readingListAB.js: Updated to use mw.testKitchen (T414435) (duration: 08m 16s)
- 21:17 jdrewniak@deploy2002: sfaci, jdrewniak: Continuing with sync
- 21:15 jdrewniak@deploy2002: sfaci, jdrewniak: Backport for Renaming `MetricsPlatform` => `TestKitchen` (T414435), readingListAB.js: Updated to use mw.testKitchen (T414435) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:13 jdrewniak@deploy2002: Started scap sync-world: Backport for Renaming `MetricsPlatform` => `TestKitchen` (T414435), readingListAB.js: Updated to use mw.testKitchen (T414435)
- 20:36 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 20:36 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 20:35 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 20:28 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 20:26 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 20:19 ladsgroup@deploy2002: Finished scap sync-world: Backport for Stop thumbnail pre-gen jobs altogether (T408062) (duration: 06m 29s)
- 20:15 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 20:14 ladsgroup@deploy2002: ladsgroup: Backport for Stop thumbnail pre-gen jobs altogether (T408062) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:12 ladsgroup@deploy2002: Started scap sync-world: Backport for Stop thumbnail pre-gen jobs altogether (T408062)
- 20:02 phuedx@deploy2002: Finished scap sync-world: Backport for Fix instrument to not send when not in sample (duration: 09m 20s)
- 19:58 phuedx@deploy2002: phuedx, milimetric: Continuing with sync
- 19:55 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 19:54 phuedx@deploy2002: phuedx, milimetric: Backport for Fix instrument to not send when not in sample synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:53 phuedx@deploy2002: Started scap sync-world: Backport for Fix instrument to not send when not in sample
- 19:45 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 19:44 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 19:40 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 19:39 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:39 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:38 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:38 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:36 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "rename fmsw1-c1-eqiad to fmsw1-e15-eqiad - cmooney@cumin1003 - T403035"
- 19:36 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "rename fmsw1-c1-eqiad to fmsw1-e15-eqiad - cmooney@cumin1003 - T403035"
- 19:36 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:35 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:34 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "rename fmsw1-c1-eqiad to fmsw1-e15-eqiad - cmooney@cumin1003 - T403035"
- 19:34 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "rename fmsw1-c1-eqiad to fmsw1-e15-eqiad - cmooney@cumin1003 - T403035"
- 19:28 brennen@deploy2002: Finished scap sync-world: Backport for Collect data four ways to find discrepancies (T416472) (duration: 10m 03s)
- 19:24 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on fasw2-e15a-eqiad,fasw2-e15b-eqiad with reason: fundraising migration eqiad
- 19:24 brennen@deploy2002: milimetric, brennen: Continuing with sync
- 19:23 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:23 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:20 brennen@deploy2002: milimetric, brennen: Backport for Collect data four ways to find discrepancies (T416472) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:18 brennen@deploy2002: Started scap sync-world: Backport for Collect data four ways to find discrepancies (T416472)
- 19:13 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:13 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:11 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 19:11 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 19:09 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:09 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change fasw2-c1 to fasw2-e15 to match new location - cmooney@cumin1003"
- 19:09 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change fasw2-c1 to fasw2-e15 to match new location - cmooney@cumin1003"
- 19:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 17:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1002.eqiad.wmnet with OS bullseye
- 17:18 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 17:15 ladsgroup@deploy2002: Finished scap sync-world: Backport for Stop relying on ThumbRenderMap and use a standard size instead (T415282), Stop relying on ThumbRenderMap and use a standard size instead (T415282) (duration: 14m 04s)
- 17:14 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 17:11 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 17:03 ladsgroup@deploy2002: ladsgroup: Backport for Stop relying on ThumbRenderMap and use a standard size instead (T415282), Stop relying on ThumbRenderMap and use a standard size instead (T415282) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:01 ladsgroup@deploy2002: Started scap sync-world: Backport for Stop relying on ThumbRenderMap and use a standard size instead (T415282), Stop relying on ThumbRenderMap and use a standard size instead (T415282)
- 16:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host pki1002
- 16:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pki1002
- 16:58 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pki1002
- 16:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki1002.eqiad.wmnet 44.32.64.10.in-addr.arpa 4.4.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:58 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache pki1002.eqiad.wmnet 44.32.64.10.in-addr.arpa 4.4.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host pki1002 - ayounsi@cumin1003"
- 16:58 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host pki1002 - ayounsi@cumin1003"
- 16:47 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 16:47 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host pki1002
- 16:46 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host pki1002.eqiad.wmnet with OS bullseye
- 16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 16:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 16:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 16:42 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:42 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1005.eqiad.wmnet with OS bookworm
- 16:37 topranks: deactivate BGP session from cr1-eqiad to pfw1a-eqiad fundraising migration T403035
- 16:32 akosiaris: manually sudo sysctl net.ipv4.conf.all.rp_filter=0 on tcp-proxy6001
- 16:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:23 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:23 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 16:19 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 16:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 16:17 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*.eqiad.wmnet: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 16:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 16:14 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 16:01 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:01 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*.eqiad.wmnet: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 15:55 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 15:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*.codfw.wmnet: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster2001.codfw.wmnet
- 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 15:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 15:33 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 15:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 15:29 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*.codfw.wmnet: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 15:29 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 15:29 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:29 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:28 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:28 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster2001.codfw.wmnet
- 15:25 topranks: deactivate BGP session from cr2-eqiad to pfw1b-eqiad fundraising migration T403035
- 15:21 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on fasw2-c1a-eqiad,fasw2-c1b-eqiad,pfw1-eqiad with reason: fundraising migration eqiad
- 15:19 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:13 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:12 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:06 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 15:05 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 15:04 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 15:03 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 14:55 cmooney@cumin1003: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-e16b-eqiad.mgmt.eqiad.wmnet
- 14:52 cmooney@cumin1003: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-e16a-eqiad.mgmt.eqiad.wmnet
- 14:44 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) opensearch-semantic-search.discovery.wmnet on all recursors
- 14:44 bking@cumin2002: START - Cookbook sre.dns.wipe-cache opensearch-semantic-search.discovery.wmnet on all recursors
- 14:42 bking@dns1004: END - running authdns-update
- 14:41 bking@dns1004: START - running authdns-update
- 14:21 Lucas_WMDE: UTC afternoon backport+config window done
- 14:16 phuedx@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents: Add code for synth-aaa-test-mw-js experiment code (duration: 11m 14s)
- 14:12 phuedx@deploy2002: phuedx: Continuing with sync
- 14:12 brouberol@dns1004: END - running authdns-update
- 14:11 brouberol@dns1004: START - running authdns-update
- 14:07 phuedx@deploy2002: phuedx: Backport for ext.wikimediaEvents: Add code for synth-aaa-test-mw-js experiment code synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:05 phuedx@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents: Add code for synth-aaa-test-mw-js experiment code
- 13:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply
- 13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply
- 13:49 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:46 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 13:46 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:43 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 13:43 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 13:43 cmooney@cumin1003: START - Cookbook sre.network.provision for device fasw2-e16b-eqiad.mgmt.eqiad.wmnet
- 13:42 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) fasw2-e16b-eqiad.mgmt.eqiad.wmnet on all recursors
- 13:42 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache fasw2-e16b-eqiad.mgmt.eqiad.wmnet on all recursors
- 13:42 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) fasw2-e16a-eqiad.mgmt.eqiad.wmnet on all recursors
- 13:42 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache fasw2-e16a-eqiad.mgmt.eqiad.wmnet on all recursors
- 13:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2204 gradually with 4 steps - After schema change
- 13:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 13:41 cmooney@cumin1003: START - Cookbook sre.network.provision for device fasw2-e16a-eqiad.mgmt.eqiad.wmnet
- 13:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply
- 13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply
- 13:02 taavi@dns1004: END - running authdns-update
- 13:00 taavi@dns1004: START - running authdns-update
- 12:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2204 gradually with 4 steps - After schema change
- 12:30 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:27 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
- 12:26 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
- 12:24 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
- 12:23 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
- 12:18 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 12:17 jmm@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 12:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1005.eqiad.wmnet with OS bookworm
- 11:55 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
- 11:55 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
- 11:46 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1184: After schema change
- 11:43 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 11:39 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 11:34 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 11:33 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 11:21 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host sretest1005
- 11:21 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1005
- 11:20 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
- 11:20 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1005.eqiad.wmnet 130.32.64.10.in-addr.arpa 0.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:20 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache sretest1005.eqiad.wmnet 130.32.64.10.in-addr.arpa 0.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:20 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:20 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1005 - ayounsi@cumin1003"
- 11:20 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1005 - ayounsi@cumin1003"
- 11:16 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 11:15 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host sretest1005
- 11:15 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 11:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
- 11:11 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 11:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
- 11:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
- 11:06 ayounsi@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1005.eqiad.wmnet with OS bookworm
- 11:06 ayounsi@cumin1003: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host sretest1005
- 11:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1005.eqiad.wmnet 3.141.64.10.in-addr.arpa 3.0.0.0.1.4.1.0.4.6.0.0.0.1.0.0.3.1.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:06 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache sretest1005.eqiad.wmnet 3.141.64.10.in-addr.arpa 3.0.0.0.1.4.1.0.4.6.0.0.0.1.0.0.3.1.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rollback records for host sretest1005 - ayounsi@cumin1003"
- 11:06 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rollback records for host sretest1005 - ayounsi@cumin1003"
- 11:02 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 11:02 ayounsi@cumin1003: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host sretest1005
- 11:02 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
- 11:02 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1005.eqiad.wmnet 130.32.64.10.in-addr.arpa 0.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:02 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache sretest1005.eqiad.wmnet 130.32.64.10.in-addr.arpa 0.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:02 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:02 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1005 - ayounsi@cumin1003"
- 11:02 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1005 - ayounsi@cumin1003"
- 11:01 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 11:01 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1184: After schema change
- 11:00 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 10:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 10:58 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host sretest1005
- 10:57 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 10:54 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 10:48 moritzm: upgrade cloudcumin1001 to bookworm T403153
- 10:48 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
- 10:42 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
- 10:36 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.14 refs T413805
- 10:24 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 10:23 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
- 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 10:18 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
- 10:13 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 10:13 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
- 09:49 ammarpad@deploy2002: mwscript-k8s job started: refreshImageMetadata.php --wiki=commonswiki --mediatype=AUDIO --mime=application/ogg '--metadata-contains=Stream Undecodable' --force # T414348
- 09:48 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
- 09:45 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 09:45 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.14 refs T413805
- 09:44 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
- 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2205 gradually with 4 steps - After schema change
- 09:39 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
- 09:37 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
- 09:28 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host sretest1002
- 09:28 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1002
- 09:28 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
- 09:27 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1002
- 09:27 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1002.eqiad.wmnet 139.48.64.10.in-addr.arpa 9.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache sretest1002.eqiad.wmnet 139.48.64.10.in-addr.arpa 9.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 09:27 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:27 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1002 - ayounsi@cumin1003"
- 09:27 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1002 - ayounsi@cumin1003"
- 09:23 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 09:21 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host sretest1002
- 09:21 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 09:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 09:07 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1184.eqiad.wmnet with reason: Schema change
- 09:07 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1184 T416480', diff saved to https://phabricator.wikimedia.org/P88703 and previous config saved to /var/cache/conftool/dbconfig/20260205-090702-marostegui.json
- 09:06 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1163 to s1 primary T416480', diff saved to https://phabricator.wikimedia.org/P88702 and previous config saved to /var/cache/conftool/dbconfig/20260205-090623-marostegui.json
- 09:04 moritzm: update hosts running routed Ganeti to dnsmasq 2.92-1~wmf12u1 T396864
- 09:02 marostegui: Starting s1 eqiad failover from db1184 to db1163 - T416480
- 09:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T416480
- 09:01 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1163 with weight 0 T416480', diff saved to https://phabricator.wikimedia.org/P88701 and previous config saved to /var/cache/conftool/dbconfig/20260205-090145-marostegui.json
- 08:58 jmm@dns1004: END - running authdns-update
- 08:57 jmm@dns1004: START - running authdns-update
- 08:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2205 gradually with 4 steps - After schema change
- 08:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T415786)', diff saved to https://phabricator.wikimedia.org/P88698 and previous config saved to /var/cache/conftool/dbconfig/20260205-081536-marostegui.json
- 08:12 Msz2001: Morning backport window finished
- 08:11 mszwarc@deploy2002: Finished scap sync-world: Backport for Remove unused 'editor' right from plwiki (duration: 08m 33s)
- 08:07 mszwarc@deploy2002: matmarex, mszwarc: Continuing with sync
- 08:05 mszwarc@deploy2002: matmarex, mszwarc: Backport for Remove unused 'editor' right from plwiki synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:02 mszwarc@deploy2002: Started scap sync-world: Backport for Remove unused 'editor' right from plwiki
- 08:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P88697 and previous config saved to /var/cache/conftool/dbconfig/20260205-080027-marostegui.json
- 07:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P88696 and previous config saved to /var/cache/conftool/dbconfig/20260205-074519-marostegui.json
- 07:42 moritzm: installing openjdk-21 security updates
- 07:36 moritzm: installing openjdk-25 security updates
- 07:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T415786)', diff saved to https://phabricator.wikimedia.org/P88695 and previous config saved to /var/cache/conftool/dbconfig/20260205-073011-marostegui.json
- 06:33 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2205.codfw.wmnet with reason: Maintenance
- 06:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2205.codfw.wmnet with reason: Schema change
- 06:32 marostegui@dns1006: END - running authdns-update
- 06:31 marostegui@dns1006: START - running authdns-update
- 06:27 marostegui@dns1006: END - running authdns-update
- 06:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2205 T416299', diff saved to https://phabricator.wikimedia.org/P88694 and previous config saved to /var/cache/conftool/dbconfig/20260205-062737-marostegui.json
- 06:26 marostegui@dns1006: START - running authdns-update
- 06:26 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2209 to s3 primary and set section read-write T416299', diff saved to https://phabricator.wikimedia.org/P88693 and previous config saved to /var/cache/conftool/dbconfig/20260205-062617-marostegui.json
- 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 codfw as read-only for maintenance - T416299', diff saved to https://phabricator.wikimedia.org/P88692 and previous config saved to /var/cache/conftool/dbconfig/20260205-062557-marostegui.json
- 06:23 marostegui: Starting s3 codfw failover from db2205 to db2209 - T416299
- 06:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T416299
- 06:22 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2209 with weight 0 T416299', diff saved to https://phabricator.wikimedia.org/P88691 and previous config saved to /var/cache/conftool/dbconfig/20260205-062215-marostegui.json
- 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2222 (T415786)', diff saved to https://phabricator.wikimedia.org/P88690 and previous config saved to /var/cache/conftool/dbconfig/20260205-060031-marostegui.json
- 06:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2222.codfw.wmnet with reason: Maintenance
- 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T415786)', diff saved to https://phabricator.wikimedia.org/P88689 and previous config saved to /var/cache/conftool/dbconfig/20260205-060006-marostegui.json
- 05:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P88688 and previous config saved to /var/cache/conftool/dbconfig/20260205-054457-marostegui.json
- 05:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P88687 and previous config saved to /var/cache/conftool/dbconfig/20260205-052949-marostegui.json
- 05:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T415786)', diff saved to https://phabricator.wikimedia.org/P88686 and previous config saved to /var/cache/conftool/dbconfig/20260205-051441-marostegui.json
- 03:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2221 (T415786)', diff saved to https://phabricator.wikimedia.org/P88685 and previous config saved to /var/cache/conftool/dbconfig/20260205-034435-marostegui.json
- 03:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2221.codfw.wmnet with reason: Maintenance
- 03:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T415786)', diff saved to https://phabricator.wikimedia.org/P88684 and previous config saved to /var/cache/conftool/dbconfig/20260205-034410-marostegui.json
- 03:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P88683 and previous config saved to /var/cache/conftool/dbconfig/20260205-032902-marostegui.json
- 03:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P88682 and previous config saved to /var/cache/conftool/dbconfig/20260205-031354-marostegui.json
- 02:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T415786)', diff saved to https://phabricator.wikimedia.org/P88681 and previous config saved to /var/cache/conftool/dbconfig/20260205-025845-marostegui.json
- 02:40 samwilson@deploy2002: Finished scap sync-world: Backport for Revert "Support WikiEditor's resizing drag bar for Page editing" (T393231) (duration: 07m 20s)
- 02:36 samwilson@deploy2002: samwilson, bhsd: Continuing with sync
- 02:35 samwilson@deploy2002: samwilson, bhsd: Backport for Revert "Support WikiEditor's resizing drag bar for Page editing" (T393231) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 02:33 samwilson@deploy2002: Started scap sync-world: Backport for Revert "Support WikiEditor's resizing drag bar for Page editing" (T393231)
- 02:23 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 22m 21s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2220 (T415786)', diff saved to https://phabricator.wikimedia.org/P88680 and previous config saved to /var/cache/conftool/dbconfig/20260205-012942-marostegui.json
- 01:29 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 01:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T415786)', diff saved to https://phabricator.wikimedia.org/P88679 and previous config saved to /var/cache/conftool/dbconfig/20260205-012918-marostegui.json
- 01:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P88678 and previous config saved to /var/cache/conftool/dbconfig/20260205-011410-marostegui.json
- 01:06 samwilson@deploy2002: Finished scap sync-world: Backport for jquery.wikiEditor.js: disable resizing bar on proofread-page (T393231) (duration: 08m 21s)
- 01:02 samwilson@deploy2002: samwilson: Continuing with sync
- 01:00 samwilson@deploy2002: samwilson: Backport for jquery.wikiEditor.js: disable resizing bar on proofread-page (T393231) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P88677 and previous config saved to /var/cache/conftool/dbconfig/20260205-005902-marostegui.json
- 00:57 samwilson@deploy2002: Started scap sync-world: Backport for jquery.wikiEditor.js: disable resizing bar on proofread-page (T393231)
- 00:51 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 00:51 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 00:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T415786)', diff saved to https://phabricator.wikimedia.org/P88676 and previous config saved to /var/cache/conftool/dbconfig/20260205-004353-marostegui.json
- 00:36 reedy@deploy2002: Finished scap sync-world: Backport for Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456), Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456) (duration: 06m 50s)
- 00:32 reedy@deploy2002: reedy, zabe: Continuing with sync
- 00:32 reedy@deploy2002: reedy, zabe: Backport for Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456), Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:30 reedy@deploy2002: Started scap sync-world: Backport for Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456), Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456)
2026-02-04
- 23:16 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2208 (T415786)', diff saved to https://phabricator.wikimedia.org/P88674 and previous config saved to /var/cache/conftool/dbconfig/20260204-231600-marostegui.json
- 23:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2208.codfw.wmnet with reason: Maintenance
- 23:10 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 22:28 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:27 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:27 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:26 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:03 tgr_: late UTC deploys done
- 22:02 tgr@deploy2002: Finished scap sync-world: Backport for Migrate EmailAuth config, step 1 (T404334) (duration: 11m 28s)
- 21:56 tgr@deploy2002: tgr: Continuing with sync
- 21:55 tgr@deploy2002: tgr: Backport for Migrate EmailAuth config, step 1 (T404334) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 21:51 tgr@deploy2002: Started scap sync-world: Backport for Migrate EmailAuth config, step 1 (T404334)
- 21:47 dancy@deploy2002: Finished scap sync-world: Backport for Add messages for 'local-bot' global group (T415588), Add messages for 'local-bot' global group (T415588) (duration: 40m 00s)
- 21:34 dancy@deploy2002: matmarex, dancy: Continuing with sync
- 21:34 dancy@deploy2002: matmarex, dancy: Backport for Add messages for 'local-bot' global group (T415588), Add messages for 'local-bot' global group (T415588) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:07 dancy@deploy2002: Started scap sync-world: Backport for Add messages for 'local-bot' global group (T415588), Add messages for 'local-bot' global group (T415588)
- 21:06 urandom: restart Cassandra to apply Java 11.0.30 upgrade, restbase/codfw — T416492
- 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 20:52 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Applying upgrade to Java 11.0.30 - eevans@cumin1003
- 20:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 20:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T415786)', diff saved to https://phabricator.wikimedia.org/P88673 and previous config saved to /var/cache/conftool/dbconfig/20260204-200512-marostegui.json
- 19:52 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
- 19:52 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
- 19:51 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
- 19:51 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
- 19:50 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 19:50 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 19:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P88672 and previous config saved to /var/cache/conftool/dbconfig/20260204-195004-marostegui.json
- 19:47 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:47 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for fasw2-e16-eqiad pair - cmooney@cumin1003"
- 19:47 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for fasw2-e16-eqiad pair - cmooney@cumin1003"
- 19:43 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 19:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P88671 and previous config saved to /var/cache/conftool/dbconfig/20260204-193455-marostegui.json
- 19:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T415786)', diff saved to https://phabricator.wikimedia.org/P88670 and previous config saved to /var/cache/conftool/dbconfig/20260204-191947-marostegui.json
- 18:48 urandom: restart Cassandra to apply Java 11.0.30 upgrade, restbase/eqiad — T416492
- 18:47 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Applying upgrade to Java 11.0.30 - eevans@cumin1003
- 18:29 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 18:29 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 18:22 dzahn@dns1004: END - running authdns-update
- 18:21 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 18:21 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs[2001,2003-2012,1011-1021]*: Applying upgrade to Java 11.0.30 - eevans@cumin1003
- 18:21 dzahn@dns1004: START - running authdns-update
- 18:21 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 18:20 dzahn@dns1004: END - running authdns-update
- 18:19 dzahn@dns1004: START - running authdns-update
- 17:44 dwisehaupt@dns1004: END - running authdns-update
- 17:42 dwisehaupt@dns1004: START - running authdns-update
- 17:41 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1004.wikimedia.org with OS trixie
- 17:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2182 (T415786)', diff saved to https://phabricator.wikimedia.org/P88668 and previous config saved to /var/cache/conftool/dbconfig/20260204-173612-marostegui.json
- 17:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 17:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T415786)', diff saved to https://phabricator.wikimedia.org/P88667 and previous config saved to /var/cache/conftool/dbconfig/20260204-173547-marostegui.json
- 17:24 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1004.wikimedia.org with reason: host reimage
- 17:21 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1004.wikimedia.org with reason: host reimage
- 17:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P88666 and previous config saved to /var/cache/conftool/dbconfig/20260204-172039-marostegui.json
- 17:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P88665 and previous config saved to /var/cache/conftool/dbconfig/20260204-170530-marostegui.json
- 17:03 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host bast1004.wikimedia.org with OS trixie
- 17:02 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1004.wikimedia.org with OS trixie
- 17:02 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 17:02 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 16:55 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 16:53 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 16:52 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 16:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T415786)', diff saved to https://phabricator.wikimedia.org/P88664 and previous config saved to /var/cache/conftool/dbconfig/20260204-165022-marostegui.json
- 16:50 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 16:49 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 16:47 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 16:44 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1004.wikimedia.org with reason: host reimage
- 16:39 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1004.wikimedia.org with reason: host reimage
- 16:34 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1236 gradually with 4 steps - After schema change
- 16:22 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host bast1004.wikimedia.org with OS trixie
- 16:18 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast1004.wikimedia.org with OS trixie
- 16:18 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host bast1004.wikimedia.org with OS trixie
- 16:13 Amir1: bumping rate limit of non-standard thumb sizes to medium browser score (T402792 T414805)
- 15:56 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host bast1004
- 15:54 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host bast1004
- 15:54 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:54 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt bast1004 - jclark@cumin1003"
- 15:54 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt bast1004 - jclark@cumin1003"
- 15:51 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ChandraWMDE out of all services on: 2497 hosts
- 15:50 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 15:49 jclark@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 15:48 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1236 gradually with 4 steps - After schema change
- 15:47 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs[2001,2003-2012,1011-1021]*: Applying upgrade to Java 11.0.30 - eevans@cumin1003
- 15:47 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 15:46 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host bast1004
- 15:46 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host bast1004
- 15:39 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns4003.wikimedia.org [reason: [end] bird2 upgrade]
- 15:39 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 15:38 urandom: restarting Cassandra on aqs[2001,2003-2012] & aqs[1011,1014-1027 to apply Java 11.0.30 — T416492
- 15:34 sukhe: upgrade to bird 2.18 on dns4003: T413740
- 15:34 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast1004.eqiad.wmnet with OS trixie
- 15:33 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host bast1004.eqiad.wmnet with OS trixie
- 15:32 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns4003.wikimedia.org [reason: bird2 upgrade]
- 15:29 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 15:28 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 15:26 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080), Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080) (duration: 16m 20s)
- 15:21 urbanecm@deploy2002: urbanecm: Continuing with sync
- 15:12 urandom: restarting Cassandra on [aqs2002.codfw.wmnet,aqs1010.eqiad.wmnet] to canary Java 11.0.30 — T416492
- 15:12 urbanecm@deploy2002: urbanecm: Backport for Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080), Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:12 urandom: restarting Cassandra on [aqs2002.codfw.wmnet,aqs1010.eqiad.wmnet] to canary Java 11.0.30 —
- 15:10 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080), Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080)
- 15:01 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2168 (T415786)', diff saved to https://phabricator.wikimedia.org/P88657 and previous config saved to /var/cache/conftool/dbconfig/20260204-150138-marostegui.json
- 15:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 15:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88656 and previous config saved to /var/cache/conftool/dbconfig/20260204-150124-marostegui.json
- 14:54 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1022.eqiad.wmnet with OS bullseye
- 14:54 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:54 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 14:54 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 14:53 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 14:53 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1023.eqiad.wmnet with OS bullseye
- 14:53 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:52 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:52 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1236.eqiad.wmnet with reason: Schema change
- 14:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1236 T416356', diff saved to https://phabricator.wikimedia.org/P88655 and previous config saved to /var/cache/conftool/dbconfig/20260204-144951-marostegui.json
- 14:49 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1181 to s7 primary T416356', diff saved to https://phabricator.wikimedia.org/P88654 and previous config saved to /var/cache/conftool/dbconfig/20260204-144914-marostegui.json
- 14:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P88653 and previous config saved to /var/cache/conftool/dbconfig/20260204-144616-marostegui.json
- 14:46 marostegui: Starting s7 eqiad failover from db1236 to db1181 - T416356
- 14:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T416356
- 14:45 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1181 with weight 0 T416356', diff saved to https://phabricator.wikimedia.org/P88652 and previous config saved to /var/cache/conftool/dbconfig/20260204-144508-marostegui.json
- 14:43 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:37 urbanecm@deploy2002: Finished scap sync-world: Backport for Add client.tag_metadata_categories field support (T414571) (duration: 09m 26s)
- 14:36 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1022.eqiad.wmnet with reason: host reimage
- 14:34 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1023.eqiad.wmnet with reason: host reimage
- 14:33 urbanecm@deploy2002: kharlan, urbanecm: Continuing with sync
- 14:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P88651 and previous config saved to /var/cache/conftool/dbconfig/20260204-143108-marostegui.json
- 14:30 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1022.eqiad.wmnet with reason: host reimage
- 14:30 urbanecm@deploy2002: kharlan, urbanecm: Backport for Add client.tag_metadata_categories field support (T414571) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:30 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1023.eqiad.wmnet with reason: host reimage
- 14:27 urbanecm@deploy2002: Started scap sync-world: Backport for Add client.tag_metadata_categories field support (T414571)
- 14:27 moritzm: remove legacy kibana discovery certificate T365798
- {{safesubst:SAL entry|1=14:24 urbanecm@deploy2002: Finished scap sync-world: Backport for Fix audio transcodes, DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), IPReputationIPoidDataLookup: Allow returning stale values for 72 hours (T416316), [[gerrit:1236689|IPRepu}}
- 14:20 sukhe: sudo cumin -b1 -s5 "A:dnsbox" "run-puppet-agent --enable 'merging CR 1228560'"
- 14:20 urbanecm@deploy2002: hartman, kharlan, urbanecm: Continuing with sync
- {{safesubst:SAL entry|1=14:18 urbanecm@deploy2002: hartman, kharlan, urbanecm: Backport for Fix audio transcodes, DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), IPReputationIPoidDataLookup: Allow returning stale values for 72 hours (T416316), [[gerrit:1236689|IPRe}}
- {{safesubst:SAL entry|1=14:16 urbanecm@deploy2002: Started scap sync-world: Backport for Fix audio transcodes, DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), IPReputationIPoidDataLookup: Allow returning stale values for 72 hours (T416316), [[gerrit:1236689|IPReput}}
- 14:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88650 and previous config saved to /var/cache/conftool/dbconfig/20260204-141559-marostegui.json
- 14:15 sukhe: sudo cumin "A:dnsbox" "disable-puppet 'merging CR 1228560'"
- 14:14 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-fe1023
- 14:14 jclark@cumin1003: START - Cookbook sre.hosts.move-vlan for host ms-fe1023
- 14:14 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1023.eqiad.wmnet with OS bullseye
- 14:14 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1022.eqiad.wmnet with OS bullseye
- 14:13 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1023.eqiad.wmnet with OS bullseye
- 14:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1022.eqiad.wmnet with OS bullseye
- 14:06 moritzm: installing php7.4 security updates
- 14:03 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:01 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:52 moritzm: disable nrpe2nodexp check for ferm on cloudcumin*
- 13:46 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1022.eqiad.wmnet with OS bullseye
- 13:46 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:37 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1024.eqiad.wmnet with OS bullseye
- 13:37 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 13:30 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:28 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 13:26 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: OpenJDK 11 security updates - jmm@cumin2002
- 13:12 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1021.eqiad.wmnet with OS bullseye
- 13:12 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 13:07 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 13:06 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: OpenJDK 11 security updates - jmm@cumin2002
- 13:05 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1024.eqiad.wmnet with reason: host reimage
- 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: OpenJDK 11 security updates - jmm@cumin2002
- 13:00 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-fe1023
- 13:00 jclark@cumin1003: START - Cookbook sre.hosts.move-vlan for host ms-fe1023
- 13:00 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1023.eqiad.wmnet with OS bullseye
- 13:00 moritzm: remove legacy wdqs-internal discovery certificate T365798
- 13:00 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1023.eqiad.wmnet with OS bullseye
- 12:58 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1024.eqiad.wmnet with reason: host reimage
- 12:53 moritzm: remove legacy eventstreams-internal discovery certificate T365798
- 12:44 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1021.eqiad.wmnet with reason: host reimage
- 12:44 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: OpenJDK 11 security updates - jmm@cumin2002
- 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: OpenJDK 11 security updates - jmm@cumin2002
- 12:42 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:42 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1024.eqiad.wmnet with OS bullseye
- 12:42 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1023.eqiad.wmnet with OS bullseye
- 12:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:40 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:40 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:40 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:40 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1021.eqiad.wmnet with reason: host reimage
- 12:33 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88648 and previous config saved to /var/cache/conftool/dbconfig/20260204-123308-marostegui.json
- 12:33 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 12:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T415786)', diff saved to https://phabricator.wikimedia.org/P88647 and previous config saved to /var/cache/conftool/dbconfig/20260204-123243-marostegui.json
- 12:32 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
- 12:31 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
- 12:23 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1021.eqiad.wmnet with OS bullseye
- 12:23 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: OpenJDK 11 security updates - jmm@cumin2002
- 12:22 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
- 12:21 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
- 12:18 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 12:18 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P88646 and previous config saved to /var/cache/conftool/dbconfig/20260204-121735-marostegui.json
- 12:07 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1015.eqiad.wmnet with OS trixie
- 12:07 jynus@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jynus@cumin1003"
- 12:06 jynus@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jynus@cumin1003"
- 12:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P88645 and previous config saved to /var/cache/conftool/dbconfig/20260204-120227-marostegui.json
- 11:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T415786)', diff saved to https://phabricator.wikimedia.org/P88644 and previous config saved to /var/cache/conftool/dbconfig/20260204-114718-marostegui.json
- 11:45 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1015.eqiad.wmnet with reason: host reimage
- 11:42 moritzm: installing openjdk-11 security updates
- 11:41 jynus@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1015.eqiad.wmnet with reason: host reimage
- 11:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 11:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 11:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T415786)', diff saved to https://phabricator.wikimedia.org/P88643 and previous config saved to /var/cache/conftool/dbconfig/20260204-113854-marostegui.json
- 11:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 11:35 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 11:35 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 11:32 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
- 11:32 elukey@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
- 11:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P88642 and previous config saved to /var/cache/conftool/dbconfig/20260204-112846-marostegui.json
- 11:26 jynus@cumin1003: START - Cookbook sre.hosts.reimage for host backup1015.eqiad.wmnet with OS trixie
- 11:23 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P88641 and previous config saved to /var/cache/conftool/dbconfig/20260204-111837-marostegui.json
- 11:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T415786)', diff saved to https://phabricator.wikimedia.org/P88640 and previous config saved to /var/cache/conftool/dbconfig/20260204-110829-marostegui.json
- 10:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
- 10:47 moritzm: installing openjdk-17 security updates
- 10:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
- 10:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T415786)', diff saved to https://phabricator.wikimedia.org/P88639 and previous config saved to /var/cache/conftool/dbconfig/20260204-104035-marostegui.json
- 10:39 moritzm: upgrade cloudcumin2001 to bookworm T403153
- 10:36 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.14 refs T413805
- 10:29 hashar: Rolling back to group0 due to an issue with OAuth on metawiki # T413805
- 10:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P88638 and previous config saved to /var/cache/conftool/dbconfig/20260204-102527-marostegui.json
- 10:11 hashar: Restarted CI Jenkins
- 10:10 hashar: Gerrit is back
- 10:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P88637 and previous config saved to /var/cache/conftool/dbconfig/20260204-101018-marostegui.json
- 10:06 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2150 (T415786)', diff saved to https://phabricator.wikimedia.org/P88636 and previous config saved to /var/cache/conftool/dbconfig/20260204-100638-marostegui.json
- 10:06 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 10:06 hashar: Restarting Gerrit instances
- 09:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T415786)', diff saved to https://phabricator.wikimedia.org/P88635 and previous config saved to /var/cache/conftool/dbconfig/20260204-095510-marostegui.json
- 09:38 moritzm: installing openssl security updates
- 09:37 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.14 refs T413805
- 09:34 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1251 (T415786)', diff saved to https://phabricator.wikimedia.org/P88634 and previous config saved to /var/cache/conftool/dbconfig/20260204-093421-marostegui.json
- 09:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1251.eqiad.wmnet with reason: Maintenance
- 09:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 09:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T415786)', diff saved to https://phabricator.wikimedia.org/P88632 and previous config saved to /var/cache/conftool/dbconfig/20260204-091015-marostegui.json
- 08:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P88631 and previous config saved to /var/cache/conftool/dbconfig/20260204-085506-marostegui.json
- 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P88630 and previous config saved to /var/cache/conftool/dbconfig/20260204-083958-marostegui.json
- 08:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T415786)', diff saved to https://phabricator.wikimedia.org/P88629 and previous config saved to /var/cache/conftool/dbconfig/20260204-082450-marostegui.json
- 08:23 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2216 (T415786)', diff saved to https://phabricator.wikimedia.org/P88628 and previous config saved to /var/cache/conftool/dbconfig/20260204-082324-marostegui.json
- 08:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 08:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88627 and previous config saved to /var/cache/conftool/dbconfig/20260204-082259-marostegui.json
- 08:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 08:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P88626 and previous config saved to /var/cache/conftool/dbconfig/20260204-080751-marostegui.json
- 07:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P88625 and previous config saved to /var/cache/conftool/dbconfig/20260204-075243-marostegui.json
- 07:39 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 07:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88624 and previous config saved to /var/cache/conftool/dbconfig/20260204-073735-marostegui.json
- 07:35 marostegui: Deploy schema change on db2204 (old s2 codfw master) T415786
- 07:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2204.codfw.wmnet with reason: Schema change
- 07:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1253 (T415786)', diff saved to https://phabricator.wikimedia.org/P88623 and previous config saved to /var/cache/conftool/dbconfig/20260204-072658-marostegui.json
- 07:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1253.eqiad.wmnet with reason: Maintenance
- 07:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T415786)', diff saved to https://phabricator.wikimedia.org/P88622 and previous config saved to /var/cache/conftool/dbconfig/20260204-072632-marostegui.json
- 07:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P88621 and previous config saved to /var/cache/conftool/dbconfig/20260204-071124-marostegui.json
- 06:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P88620 and previous config saved to /var/cache/conftool/dbconfig/20260204-065616-marostegui.json
- 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T415786)', diff saved to https://phabricator.wikimedia.org/P88619 and previous config saved to /var/cache/conftool/dbconfig/20260204-064118-marostegui.json
- 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T415786)', diff saved to https://phabricator.wikimedia.org/P88618 and previous config saved to /var/cache/conftool/dbconfig/20260204-064107-marostegui.json
- 06:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P88617 and previous config saved to /var/cache/conftool/dbconfig/20260204-063103-marostegui.json
- 06:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P88616 and previous config saved to /var/cache/conftool/dbconfig/20260204-062055-marostegui.json
- 06:18 marostegui@dns1006: END - running authdns-update
- 06:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2204 T416300', diff saved to https://phabricator.wikimedia.org/P88615 and previous config saved to /var/cache/conftool/dbconfig/20260204-061739-marostegui.json
- 06:17 marostegui@dns1006: START - running authdns-update
- 06:16 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2207 to s2 primary and set section read-write T416300', diff saved to https://phabricator.wikimedia.org/P88614 and previous config saved to /var/cache/conftool/dbconfig/20260204-061637-marostegui.json
- 06:16 marostegui@cumin1003: dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T416300', diff saved to https://phabricator.wikimedia.org/P88613 and previous config saved to /var/cache/conftool/dbconfig/20260204-061613-marostegui.json
- 06:13 marostegui: Starting s2 codfw failover from db2204 to db2207 - T416300
- 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T416300
- 06:11 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2207 with weight 0 T416300', diff saved to https://phabricator.wikimedia.org/P88612 and previous config saved to /var/cache/conftool/dbconfig/20260204-061122-marostegui.json
- 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T415786)', diff saved to https://phabricator.wikimedia.org/P88611 and previous config saved to /var/cache/conftool/dbconfig/20260204-061047-marostegui.json
- 06:05 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88610 and previous config saved to /var/cache/conftool/dbconfig/20260204-060516-marostegui.json
- 06:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1231 (T415786)', diff saved to https://phabricator.wikimedia.org/P88609 and previous config saved to /var/cache/conftool/dbconfig/20260204-054542-marostegui.json
- 05:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88608 and previous config saved to /var/cache/conftool/dbconfig/20260204-054518-marostegui.json
- 05:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P88607 and previous config saved to /var/cache/conftool/dbconfig/20260204-053009-marostegui.json
- 05:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P88606 and previous config saved to /var/cache/conftool/dbconfig/20260204-051501-marostegui.json
- 04:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88605 and previous config saved to /var/cache/conftool/dbconfig/20260204-045953-marostegui.json
- 04:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 04:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T415786)', diff saved to https://phabricator.wikimedia.org/P88604 and previous config saved to /var/cache/conftool/dbconfig/20260204-044137-marostegui.json
- 04:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1235 (T415786)', diff saved to https://phabricator.wikimedia.org/P88603 and previous config saved to /var/cache/conftool/dbconfig/20260204-044022-marostegui.json
- 04:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 04:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T415786)', diff saved to https://phabricator.wikimedia.org/P88602 and previous config saved to /var/cache/conftool/dbconfig/20260204-043958-marostegui.json
- 04:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P88601 and previous config saved to /var/cache/conftool/dbconfig/20260204-042950-marostegui.json
- 04:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P88600 and previous config saved to /var/cache/conftool/dbconfig/20260204-042629-marostegui.json
- 04:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P88599 and previous config saved to /var/cache/conftool/dbconfig/20260204-041941-marostegui.json
- 04:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P88598 and previous config saved to /var/cache/conftool/dbconfig/20260204-041121-marostegui.json
- 04:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T415786)', diff saved to https://phabricator.wikimedia.org/P88597 and previous config saved to /var/cache/conftool/dbconfig/20260204-040933-marostegui.json
- 03:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T415786)', diff saved to https://phabricator.wikimedia.org/P88596 and previous config saved to /var/cache/conftool/dbconfig/20260204-035612-marostegui.json
- 03:31 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88595 and previous config saved to /var/cache/conftool/dbconfig/20260204-033110-marostegui.json
- 03:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 03:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T415786)', diff saved to https://phabricator.wikimedia.org/P88594 and previous config saved to /var/cache/conftool/dbconfig/20260204-033046-marostegui.json
- 03:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P88593 and previous config saved to /var/cache/conftool/dbconfig/20260204-031537-marostegui.json
- 03:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P88592 and previous config saved to /var/cache/conftool/dbconfig/20260204-030029-marostegui.json
- 02:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T415786)', diff saved to https://phabricator.wikimedia.org/P88591 and previous config saved to /var/cache/conftool/dbconfig/20260204-024521-marostegui.json
- 02:37 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1234 (T415786)', diff saved to https://phabricator.wikimedia.org/P88590 and previous config saved to /var/cache/conftool/dbconfig/20260204-023659-marostegui.json
- 02:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 02:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T415786)', diff saved to https://phabricator.wikimedia.org/P88589 and previous config saved to /var/cache/conftool/dbconfig/20260204-023634-marostegui.json
- 02:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2188 (T415786)', diff saved to https://phabricator.wikimedia.org/P88588 and previous config saved to /var/cache/conftool/dbconfig/20260204-022717-marostegui.json
- 02:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 02:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T415786)', diff saved to https://phabricator.wikimedia.org/P88587 and previous config saved to /var/cache/conftool/dbconfig/20260204-022652-marostegui.json
- 02:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P88586 and previous config saved to /var/cache/conftool/dbconfig/20260204-022626-marostegui.json
- 02:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P88585 and previous config saved to /var/cache/conftool/dbconfig/20260204-021617-marostegui.json
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 50s)
- 02:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P88584 and previous config saved to /var/cache/conftool/dbconfig/20260204-021144-marostegui.json
- 02:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T415786)', diff saved to https://phabricator.wikimedia.org/P88583 and previous config saved to /var/cache/conftool/dbconfig/20260204-020609-marostegui.json
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P88582 and previous config saved to /var/cache/conftool/dbconfig/20260204-015635-marostegui.json
- 01:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T415786)', diff saved to https://phabricator.wikimedia.org/P88581 and previous config saved to /var/cache/conftool/dbconfig/20260204-014127-marostegui.json
- 01:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1202 (T415786)', diff saved to https://phabricator.wikimedia.org/P88580 and previous config saved to /var/cache/conftool/dbconfig/20260204-013958-marostegui.json
- 01:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 01:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88579 and previous config saved to /var/cache/conftool/dbconfig/20260204-013944-marostegui.json
- 01:36 ladsgroup@deploy2002: Finished scap sync-world: Backport for UserImpact: Remove zeros in per-article view stats (T414080), UserImpact: Remove zeros in per-article view stats (T414080) (duration: 10m 38s)
- 01:29 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 01:29 ladsgroup@deploy2002: ladsgroup: Backport for UserImpact: Remove zeros in per-article view stats (T414080), UserImpact: Remove zeros in per-article view stats (T414080) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:27 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1006.eqiad.wmnet with OS trixie
- 01:27 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:26 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:25 ladsgroup@deploy2002: Started scap sync-world: Backport for UserImpact: Remove zeros in per-article view stats (T414080), UserImpact: Remove zeros in per-article view stats (T414080)
- 01:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P88578 and previous config saved to /var/cache/conftool/dbconfig/20260204-012436-marostegui.json
- 01:24 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1008.eqiad.wmnet with OS trixie
- 01:24 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:23 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:20 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1007.eqiad.wmnet with OS trixie
- 01:20 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:19 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:15 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1005.eqiad.wmnet with OS trixie
- 01:15 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:15 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:10 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1006.eqiad.wmnet with reason: host reimage
- 01:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P88577 and previous config saved to /var/cache/conftool/dbconfig/20260204-010928-marostegui.json
- 01:07 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1008.eqiad.wmnet with reason: host reimage
- 01:03 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1007.eqiad.wmnet with reason: host reimage
- 01:01 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1008.eqiad.wmnet with reason: host reimage
- 00:59 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1005.eqiad.wmnet with reason: host reimage
- 00:57 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1007.eqiad.wmnet with reason: host reimage
- 00:56 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1006.eqiad.wmnet with reason: host reimage
- 00:55 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1005.eqiad.wmnet with reason: host reimage
- 00:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88576 and previous config saved to /var/cache/conftool/dbconfig/20260204-005419-marostegui.json
- 00:50 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1008.eqiad.wmnet with OS trixie
- 00:49 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1004.eqiad.wmnet with OS trixie
- 00:49 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host tools-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:46 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1007.eqiad.wmnet with OS trixie
- 00:45 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host tools-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:45 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1003.eqiad.wmnet with OS trixie
- 00:45 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1006.eqiad.wmnet with OS trixie
- 00:44 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1005.eqiad.wmnet with OS trixie
- 00:42 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host tools-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:42 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1002.eqiad.wmnet with OS trixie
- 00:41 jclark@cumin1003: START - Cookbook sre.hosts.provision for host tools-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:40 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host tools-k8s-worker1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host tools-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:37 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1001.eqiad.wmnet with OS trixie
- 00:35 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1232 (T415786)', diff saved to https://phabricator.wikimedia.org/P88575 and previous config saved to /var/cache/conftool/dbconfig/20260204-003551-marostegui.json
- 00:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 00:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T415786)', diff saved to https://phabricator.wikimedia.org/P88574 and previous config saved to /var/cache/conftool/dbconfig/20260204-003526-marostegui.json
- 00:34 jclark@cumin1003: START - Cookbook sre.hosts.provision for host tools-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:33 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1004.eqiad.wmnet with reason: host reimage
- 00:33 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-ctrl1001.eqiad.wmnet with OS trixie
- 00:33 jclark@cumin1003: START - Cookbook sre.hosts.provision for host tools-k8s-worker1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:30 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-ctrl1002.eqiad.wmnet with OS trixie
- 00:29 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1003.eqiad.wmnet with reason: host reimage
- 00:25 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1002.eqiad.wmnet with reason: host reimage
- 00:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P88573 and previous config saved to /var/cache/conftool/dbconfig/20260204-002518-marostegui.json
- 00:24 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1003.eqiad.wmnet with reason: host reimage
- 00:23 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1004.eqiad.wmnet with reason: host reimage
- 00:21 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1001.eqiad.wmnet with reason: host reimage
- 00:17 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1002.eqiad.wmnet with reason: host reimage
- 00:17 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1001.eqiad.wmnet with reason: host reimage
- 00:17 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
- 00:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P88572 and previous config saved to /var/cache/conftool/dbconfig/20260204-001509-marostegui.json
- 00:13 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
- 00:11 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1003.eqiad.wmnet with OS trixie
- 00:11 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1004.eqiad.wmnet with OS trixie
- 00:09 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
- 00:09 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
- 00:05 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1002.eqiad.wmnet with OS trixie
- 00:05 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1001.eqiad.wmnet with OS trixie
- 00:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T415786)', diff saved to https://phabricator.wikimedia.org/P88571 and previous config saved to /var/cache/conftool/dbconfig/20260204-000501-marostegui.json
2026-02-03
- 23:57 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-ctrl1002.eqiad.wmnet with OS trixie
- 23:57 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-ctrl1001.eqiad.wmnet with OS trixie
- 23:56 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2176 (T415786)', diff saved to https://phabricator.wikimedia.org/P88570 and previous config saved to /var/cache/conftool/dbconfig/20260203-235634-marostegui.json
- 23:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 23:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88569 and previous config saved to /var/cache/conftool/dbconfig/20260203-235609-marostegui.json
- 23:55 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1008
- 23:55 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1008
- 23:55 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1007
- 23:55 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1007
- 23:55 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1006
- 23:54 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1006
- 23:54 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1005
- 23:54 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1005
- 23:54 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1004
- 23:54 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1004
- 23:54 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1003
- 23:53 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1003
- 23:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88568 and previous config saved to /var/cache/conftool/dbconfig/20260203-234932-marostegui.json
- 23:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 23:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T415786)', diff saved to https://phabricator.wikimedia.org/P88567 and previous config saved to /var/cache/conftool/dbconfig/20260203-234908-marostegui.json
- 23:48 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1002
- 23:48 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1002
- 23:47 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1001
- 23:47 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1001
- 23:47 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-ctrl1002
- 23:47 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-ctrl1002
- 23:47 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-ctrl1001
- 23:46 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-ctrl1001
- 23:45 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:45 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt tools-k8 - jclark@cumin1003"
- 23:45 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt tools-k8 - jclark@cumin1003"
- 23:41 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 23:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P88566 and previous config saved to /var/cache/conftool/dbconfig/20260203-234100-marostegui.json
- 23:40 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1015.eqiad.wmnet with OS bookworm
- 23:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P88565 and previous config saved to /var/cache/conftool/dbconfig/20260203-233400-marostegui.json
- 23:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P88564 and previous config saved to /var/cache/conftool/dbconfig/20260203-232552-marostegui.json
- 23:23 mutante: vrts1003 - fix systemd state: sed -i 's/vrts_rsync/rsync/' /lib/systemd/system/wmf_auto_restart_vrts_rsync.service ; systemctl daemon-reload - T416380 T135991
- 23:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P88563 and previous config saved to /var/cache/conftool/dbconfig/20260203-231851-marostegui.json
- 23:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88562 and previous config saved to /var/cache/conftool/dbconfig/20260203-231044-marostegui.json
- 23:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T415786)', diff saved to https://phabricator.wikimedia.org/P88561 and previous config saved to /var/cache/conftool/dbconfig/20260203-230343-marostegui.json
- 23:00 inflatador: bking@laptop roll-restarting wdqs codfw as it's lagging heavily
- 22:35 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 22:34 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 22:34 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 22:33 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 22:32 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1219 (T415786)', diff saved to https://phabricator.wikimedia.org/P88560 and previous config saved to /var/cache/conftool/dbconfig/20260203-223216-marostegui.json
- 22:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 22:32 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 22:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T415786)', diff saved to https://phabricator.wikimedia.org/P88559 and previous config saved to /var/cache/conftool/dbconfig/20260203-223151-marostegui.json
- 22:31 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 22:29 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 22:29 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 22:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P88558 and previous config saved to /var/cache/conftool/dbconfig/20260203-222142-marostegui.json
- 22:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P88556 and previous config saved to /var/cache/conftool/dbconfig/20260203-221134-marostegui.json
- 22:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T415786)', diff saved to https://phabricator.wikimedia.org/P88555 and previous config saved to /var/cache/conftool/dbconfig/20260203-220126-marostegui.json
- 21:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1191 (T415786)', diff saved to https://phabricator.wikimedia.org/P88554 and previous config saved to /var/cache/conftool/dbconfig/20260203-215751-marostegui.json
- 21:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 21:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T415786)', diff saved to https://phabricator.wikimedia.org/P88553 and previous config saved to /var/cache/conftool/dbconfig/20260203-215726-marostegui.json
- 21:54 dwisehaupt@dns1004: END - running authdns-update
- 21:52 dwisehaupt@dns1004: START - running authdns-update
- 21:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P88552 and previous config saved to /var/cache/conftool/dbconfig/20260203-214218-marostegui.json
- 21:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P88551 and previous config saved to /var/cache/conftool/dbconfig/20260203-212709-marostegui.json
- 21:26 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88550 and previous config saved to /var/cache/conftool/dbconfig/20260203-212616-marostegui.json
- 21:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 21:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T415786)', diff saved to https://phabricator.wikimedia.org/P88549 and previous config saved to /var/cache/conftool/dbconfig/20260203-212550-marostegui.json
- 21:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T415786)', diff saved to https://phabricator.wikimedia.org/P88548 and previous config saved to /var/cache/conftool/dbconfig/20260203-211201-marostegui.json
- 21:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P88547 and previous config saved to /var/cache/conftool/dbconfig/20260203-211041-marostegui.json
- 20:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P88545 and previous config saved to /var/cache/conftool/dbconfig/20260203-205532-marostegui.json
- 20:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T415786)', diff saved to https://phabricator.wikimedia.org/P88544 and previous config saved to /var/cache/conftool/dbconfig/20260203-204024-marostegui.json
- 20:30 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1024.eqiad.wmnet with OS bullseye
- 20:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1218 (T415786)', diff saved to https://phabricator.wikimedia.org/P88543 and previous config saved to /var/cache/conftool/dbconfig/20260203-202743-marostegui.json
- 20:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 20:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T415786)', diff saved to https://phabricator.wikimedia.org/P88542 and previous config saved to /var/cache/conftool/dbconfig/20260203-202718-marostegui.json
- 20:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P88541 and previous config saved to /var/cache/conftool/dbconfig/20260203-201709-marostegui.json
- 20:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P88540 and previous config saved to /var/cache/conftool/dbconfig/20260203-200700-marostegui.json
- 20:01 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1181 (T415786)', diff saved to https://phabricator.wikimedia.org/P88539 and previous config saved to /var/cache/conftool/dbconfig/20260203-200130-marostegui.json
- 20:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 20:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88538 and previous config saved to /var/cache/conftool/dbconfig/20260203-200106-marostegui.json
- 19:59 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1015.eqiad.wmnet with OS bookworm
- 19:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T415786)', diff saved to https://phabricator.wikimedia.org/P88537 and previous config saved to /var/cache/conftool/dbconfig/20260203-195652-marostegui.json
- 19:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P88536 and previous config saved to /var/cache/conftool/dbconfig/20260203-194557-marostegui.json
- 19:39 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1024.eqiad.wmnet with OS bullseye
- 19:38 cmooney@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-fe1024.eqiad.wmnet with OS bullseye
- 19:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P88534 and previous config saved to /var/cache/conftool/dbconfig/20260203-193049-marostegui.json
- 19:28 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-fe1024
- 19:28 cmooney@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe1024
- 19:27 cmooney@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe1024
- 19:27 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-fe1024.eqiad.wmnet 205.48.64.10.in-addr.arpa 5.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 19:27 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ms-fe1024.eqiad.wmnet 205.48.64.10.in-addr.arpa 5.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 19:27 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:27 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-fe1024 - cmooney@cumin1003"
- 19:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-fe1024 - cmooney@cumin1003"
- 19:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 19:23 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1021.eqiad.wmnet with OS bullseye
- 19:23 cmooney@cumin1003: START - Cookbook sre.hosts.move-vlan for host ms-fe1024
- 19:23 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1024.eqiad.wmnet with OS bullseye
- 19:21 sukhe@dns1004: END - running authdns-update
- 19:20 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 19:20 sukhe: testing authdns-update (NOOP run)
- 19:20 sukhe@dns1004: START - running authdns-update
- 19:19 ryankemper@cumin2002: END (ERROR) - Cookbook sre.hadoop.reboot-workers (exit_code=97) for Hadoop analytics cluster
- 19:19 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 19:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88533 and previous config saved to /var/cache/conftool/dbconfig/20260203-191541-marostegui.json
- 19:15 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 19:15 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 19:12 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-fe1023
- 19:12 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe1023
- 19:11 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe1023
- 19:11 jclark@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-fe1023.eqiad.wmnet 170.32.64.10.in-addr.arpa 0.7.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 19:11 jclark@cumin1003: START - Cookbook sre.dns.wipe-cache ms-fe1023.eqiad.wmnet 170.32.64.10.in-addr.arpa 0.7.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 19:11 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:11 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-fe1023 - jclark@cumin1003"
- 19:11 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-fe1023 - jclark@cumin1003"
- 19:09 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1015.eqiad.wmnet with OS bookworm
- 19:04 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 19:04 jclark@cumin1003: START - Cookbook sre.hosts.move-vlan for host ms-fe1023
- 19:04 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1023.eqiad.wmnet with OS bullseye
- 18:56 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1021.eqiad.wmnet with OS bullseye
- 18:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2173 (T415786)', diff saved to https://phabricator.wikimedia.org/P88532 and previous config saved to /var/cache/conftool/dbconfig/20260203-185326-marostegui.json
- 18:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 18:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88531 and previous config saved to /var/cache/conftool/dbconfig/20260203-185302-marostegui.json
- 18:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 18:52 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 18:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 18:52 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 18:49 swfrench@deploy2002: Finished scap sync-world: Rebuild deployment to pick up new production image (duration: 46m 41s)
- 18:39 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P88530 and previous config saved to /var/cache/conftool/dbconfig/20260203-183753-marostegui.json
- 18:25 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:23 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1206 (T415786)', diff saved to https://phabricator.wikimedia.org/P88529 and previous config saved to /var/cache/conftool/dbconfig/20260203-182302-marostegui.json
- 18:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 18:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P88528 and previous config saved to /var/cache/conftool/dbconfig/20260203-182245-marostegui.json
- 18:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T415786)', diff saved to https://phabricator.wikimedia.org/P88527 and previous config saved to /var/cache/conftool/dbconfig/20260203-182238-marostegui.json
- 18:20 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P88526 and previous config saved to /var/cache/conftool/dbconfig/20260203-181229-marostegui.json
- 18:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88525 and previous config saved to /var/cache/conftool/dbconfig/20260203-180737-marostegui.json
- 18:07 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:06 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88524 and previous config saved to /var/cache/conftool/dbconfig/20260203-180650-marostegui.json
- 18:06 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:06 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 18:04 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:03 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:03 swfrench@deploy2002: Started scap sync-world: Rebuild deployment to pick up new production image
- 18:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P88523 and previous config saved to /var/cache/conftool/dbconfig/20260203-180221-marostegui.json
- 17:55 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:53 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T415786)', diff saved to https://phabricator.wikimedia.org/P88522 and previous config saved to /var/cache/conftool/dbconfig/20260203-175213-marostegui.json
- 17:51 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:49 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host backup1015
- 17:49 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1015
- 17:48 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host backup1015
- 17:48 jclark@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) backup1015.eqiad.wmnet 169.32.64.10.in-addr.arpa 9.6.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 17:48 jclark@cumin1003: START - Cookbook sre.dns.wipe-cache backup1015.eqiad.wmnet 169.32.64.10.in-addr.arpa 9.6.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 17:48 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:48 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host backup1015 - jclark@cumin1003"
- 17:48 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host backup1015 - jclark@cumin1003"
- 17:48 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:46 sukhe: sudo cumin -b1 -s120 "A:dnsbox and not P{dns1004* or dns7001*}" "run-puppet-agent --enable 'merging CR 1230351'": T81605
- 17:46 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:46 swfrench-wmf: reprepro include php8.3_8.3.30-1+wmf11u2 in component/php83
- 17:45 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 17:45 jclark@cumin1003: START - Cookbook sre.hosts.move-vlan for host backup1015
- 17:45 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1015.eqiad.wmnet with OS bookworm
- 17:02 mutante: gerrit - deployed gerrit:1234269 to remove separate *qos* apache logs - deleted *qos* logs to fix disk space issues - back to 83% usage on / on gerrit1003
- 16:48 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 16:47 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 16:47 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 16:46 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 16:44 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 16:44 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 16:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 16:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88521 and previous config saved to /var/cache/conftool/dbconfig/20260203-162833-marostegui.json
- 16:15 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88520 and previous config saved to /var/cache/conftool/dbconfig/20260203-161530-marostegui.json
- 16:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 16:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T415786)', diff saved to https://phabricator.wikimedia.org/P88519 and previous config saved to /var/cache/conftool/dbconfig/20260203-161506-marostegui.json
- 16:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P88517 and previous config saved to /var/cache/conftool/dbconfig/20260203-161325-marostegui.json
- 16:13 topranks: disable Hurricane Electric IPv6 BGP session on cr2-magru to troubleshoot ns2 IPv6 routing issue
- 16:11 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 16:10 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 16:06 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 16:05 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 16:04 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 16:04 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 15:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P88515 and previous config saved to /var/cache/conftool/dbconfig/20260203-155957-marostegui.json
- 15:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P88513 and previous config saved to /var/cache/conftool/dbconfig/20260203-155816-marostegui.json
- 15:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1196 (T415786)', diff saved to https://phabricator.wikimedia.org/P88511 and previous config saved to /var/cache/conftool/dbconfig/20260203-155713-marostegui.json
- 15:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 15:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 15:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T415786)', diff saved to https://phabricator.wikimedia.org/P88510 and previous config saved to /var/cache/conftool/dbconfig/20260203-155628-marostegui.json
- 15:51 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 15:50 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 15:47 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org,service=authdns-ns2 [reason: testing authdns IPv6 change]
- 15:47 slyngshede@dns1004: END - running authdns-update
- 15:46 slyngshede@dns1004: START - running authdns-update
- 15:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P88508 and previous config saved to /var/cache/conftool/dbconfig/20260203-154619-marostegui.json
- 15:44 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 15:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P88507 and previous config saved to /var/cache/conftool/dbconfig/20260203-154449-marostegui.json
- 15:44 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org,service=authdns-ns2 [reason: testing authdns IPv6 change]
- 15:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88506 and previous config saved to /var/cache/conftool/dbconfig/20260203-154308-marostegui.json
- 15:39 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing authdns IPv6 change]
- 15:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P88505 and previous config saved to /var/cache/conftool/dbconfig/20260203-153611-marostegui.json
- 15:32 sukhe: sudo cumin "A:dnsbox" "disable-puppet 'merging CR 1230351'": T81605
- 15:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T415786)', diff saved to https://phabricator.wikimedia.org/P88504 and previous config saved to /var/cache/conftool/dbconfig/20260203-152941-marostegui.json
- 15:28 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing authdns IPv6 change]
- 15:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T415786)', diff saved to https://phabricator.wikimedia.org/P88503 and previous config saved to /var/cache/conftool/dbconfig/20260203-152602-marostegui.json
- 15:25 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
- 15:18 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:17 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.1 - enable IPv6 SAFI for DNS hosts - cmooney@cumin1003
- 15:16 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.1 - enable IPv6 SAFI for DNS hosts - cmooney@cumin1003
- 15:15 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:15 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:12 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:10 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
- 15:09 moritzm: installing openjdk-17 security updates
- 15:01 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:59 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:57 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 14:56 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:53 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:52 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:51 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 14:36 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 14:35 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 14:30 moritzm: installing bind9 security updates
- 14:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 14:25 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 14:25 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 14:24 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 14:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:04 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:58 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88501 and previous config saved to /var/cache/conftool/dbconfig/20260203-135840-marostegui.json
- 13:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 13:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T415786)', diff saved to https://phabricator.wikimedia.org/P88500 and previous config saved to /var/cache/conftool/dbconfig/20260203-135813-marostegui.json
- 13:58 samtar@deploy2002: Finished scap sync-world: Backport for Remove unused SpecialMobileEditWatchlist::outputSubtitle() (T416294) (duration: 08m 03s)
- 13:53 samtar@deploy2002: samwilson, samtar: Continuing with sync
- 13:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:52 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:52 samtar@deploy2002: samwilson, samtar: Backport for Remove unused SpecialMobileEditWatchlist::outputSubtitle() (T416294) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:50 samtar@deploy2002: Started scap sync-world: Backport for Remove unused SpecialMobileEditWatchlist::outputSubtitle() (T416294)
- 13:45 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2153 (T415786)', diff saved to https://phabricator.wikimedia.org/P88498 and previous config saved to /var/cache/conftool/dbconfig/20260203-134514-marostegui.json
- 13:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 13:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T415786)', diff saved to https://phabricator.wikimedia.org/P88497 and previous config saved to /var/cache/conftool/dbconfig/20260203-134445-marostegui.json
- 13:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P88496 and previous config saved to /var/cache/conftool/dbconfig/20260203-134303-marostegui.json
- 13:38 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1195 (T415786)', diff saved to https://phabricator.wikimedia.org/P88495 and previous config saved to /var/cache/conftool/dbconfig/20260203-133818-marostegui.json
- 13:38 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 13:37 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T415786)', diff saved to https://phabricator.wikimedia.org/P88494 and previous config saved to /var/cache/conftool/dbconfig/20260203-133754-marostegui.json
- 13:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P88493 and previous config saved to /var/cache/conftool/dbconfig/20260203-132936-marostegui.json
- 13:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P88492 and previous config saved to /var/cache/conftool/dbconfig/20260203-132755-marostegui.json
- 13:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P88491 and previous config saved to /var/cache/conftool/dbconfig/20260203-132745-marostegui.json
- 13:21 joal@deploy2002: Finished deploy [analytics/refinery@fc72bd3]: Regular analytics weekly train [analytics/refinery@fc72bd31] (duration: 07m 11s)
- 13:20 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:20 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:19 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P88490 and previous config saved to /var/cache/conftool/dbconfig/20260203-131735-marostegui.json
- 13:16 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:14 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:14 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:14 joal@deploy2002: Started deploy [analytics/refinery@fc72bd3]: Regular analytics weekly train [analytics/refinery@fc72bd31]
- 13:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P88489 and previous config saved to /var/cache/conftool/dbconfig/20260203-131424-marostegui.json
- 13:14 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:14 joal@deploy2002: Finished deploy [analytics/refinery@fc72bd3] (thin): Regular analytics weekly train THIN [analytics/refinery@fc72bd31] (duration: 01m 20s)
- 13:14 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:12 joal@deploy2002: Started deploy [analytics/refinery@fc72bd3] (thin): Regular analytics weekly train THIN [analytics/refinery@fc72bd31]
- 13:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T415786)', diff saved to https://phabricator.wikimedia.org/P88488 and previous config saved to /var/cache/conftool/dbconfig/20260203-131245-marostegui.json
- 13:12 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:12 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ms-fe - jclark@cumin1003"
- 13:12 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ms-fe - jclark@cumin1003"
- 13:12 joal@deploy2002: Finished deploy [analytics/refinery@fc72bd3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc72bd31] (duration: 01m 01s)
- 13:10 joal@deploy2002: Started deploy [analytics/refinery@fc72bd3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc72bd31]
- 13:08 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 13:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T415786)', diff saved to https://phabricator.wikimedia.org/P88487 and previous config saved to /var/cache/conftool/dbconfig/20260203-130724-marostegui.json
- 12:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T415786)', diff saved to https://phabricator.wikimedia.org/P88486 and previous config saved to /var/cache/conftool/dbconfig/20260203-125912-marostegui.json
- 12:28 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:25 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:24 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:22 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:22 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:20 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:17 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1158 (T415786)', diff saved to https://phabricator.wikimedia.org/P88485 and previous config saved to /var/cache/conftool/dbconfig/20260203-120905-marostegui.json
- 12:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 12:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 12:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1223: After schema change
- 12:06 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:52 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:47 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:47 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1015 - jclark@cumin1003"
- 11:47 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1015 - jclark@cumin1003"
- 11:43 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 11:41 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host bast1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:22 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1186 (T415786)', diff saved to https://phabricator.wikimedia.org/P88481 and previous config saved to /var/cache/conftool/dbconfig/20260203-112156-marostegui.json
- 11:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 11:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T415786)', diff saved to https://phabricator.wikimedia.org/P88480 and previous config saved to /var/cache/conftool/dbconfig/20260203-112130-marostegui.json
- 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1223: After schema change
- 11:20 jclark@cumin1003: START - Cookbook sre.hosts.provision for host bast1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:18 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:18 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt bast1004 - jclark@cumin1003"
- 11:18 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt bast1004 - jclark@cumin1003"
- 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2146 (T415786)', diff saved to https://phabricator.wikimedia.org/P88478 and previous config saved to /var/cache/conftool/dbconfig/20260203-111636-marostegui.json
- 11:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T415786)', diff saved to https://phabricator.wikimedia.org/P88477 and previous config saved to /var/cache/conftool/dbconfig/20260203-111607-marostegui.json
- 11:12 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 11:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P88476 and previous config saved to /var/cache/conftool/dbconfig/20260203-111120-marostegui.json
- 11:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P88475 and previous config saved to /var/cache/conftool/dbconfig/20260203-110108-marostegui.json
- 11:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P88474 and previous config saved to /var/cache/conftool/dbconfig/20260203-110057-marostegui.json
- 10:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T415786)', diff saved to https://phabricator.wikimedia.org/P88473 and previous config saved to /var/cache/conftool/dbconfig/20260203-105059-marostegui.json
- 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P88472 and previous config saved to /var/cache/conftool/dbconfig/20260203-104547-marostegui.json
- 10:33 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Translate an article' 'Event:Celebrate Women/Translate an article' Ammarpad # T416031
- 10:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T415786)', diff saved to https://phabricator.wikimedia.org/P88471 and previous config saved to /var/cache/conftool/dbconfig/20260203-103037-marostegui.json
- 10:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2192: After schema change
- 10:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-wmde: apply
- 10:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-wmde: apply
- 10:28 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2214: After schema change
- 10:28 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2161: After schema change
- 10:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-wikidata: apply
- 10:27 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-wikidata: apply
- 10:26 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
- 10:24 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Suggested activities' 'Event:Celebrate Women/Suggested activities' Ammarpad # T416031
- 10:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
- 10:17 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 10:17 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 10:16 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-search: apply
- 10:16 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-search: apply
- 10:16 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Resources' 'Event:Celebrate Women/Resources' Ammarpad # T416031
- 10:16 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-research: apply
- 10:15 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-research: apply
- 10:15 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
- 10:14 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
- 10:13 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-platform-eng: apply
- 10:13 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-platform-eng: apply
- 10:12 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-ml: apply
- 10:12 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-ml: apply
- 10:11 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-dev: apply
- 10:11 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-dev: apply
- 10:11 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-analytics-product: apply
- 10:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-analytics-product: apply
- 10:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-analytics-test: apply
- 10:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-analytics-test: apply
- 10:09 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Learn how Wikipedia works' 'Event:Celebrate Women/Learn how Wikipedia works' Ammarpad # T416031
- 09:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db2192: After schema change
- 09:43 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db2214: After schema change
- 09:42 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db2161: After schema change
- 09:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1223 T416298', diff saved to https://phabricator.wikimedia.org/P88457 and previous config saved to /var/cache/conftool/dbconfig/20260203-094116-marostegui.json
- 09:40 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1189 to s3 primary T416298', diff saved to https://phabricator.wikimedia.org/P88456 and previous config saved to /var/cache/conftool/dbconfig/20260203-094038-marostegui.json
- 09:38 marostegui: Starting s3 eqiad failover from db1223 to db1189 - T416298
- 09:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T416298
- 09:37 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1189 with weight 0 T416298', diff saved to https://phabricator.wikimedia.org/P88455 and previous config saved to /var/cache/conftool/dbconfig/20260203-093736-marostegui.json
- 09:17 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Improve an article' 'Event:Celebrate Women/Improve an article' Ammarpad # T416031
- 09:13 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.14 refs T413805
- 09:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1169 (T415786)', diff saved to https://phabricator.wikimedia.org/P88454 and previous config saved to /var/cache/conftool/dbconfig/20260203-091110-marostegui.json
- 09:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 09:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T415786)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260203-091039-marostegui.json
- 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P88452 and previous config saved to /var/cache/conftool/dbconfig/20260203-090031-marostegui.json
- 08:59 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Events/2025' 'Event:Celebrate Women/Events/2025' Ammarpad # T416031
- 08:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P88451 and previous config saved to /var/cache/conftool/dbconfig/20260203-085022-marostegui.json
- 08:49 moritzm: installing libcommons-lang3-java security updates
- 08:47 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2145 (T415786)', diff saved to https://phabricator.wikimedia.org/P88450 and previous config saved to /var/cache/conftool/dbconfig/20260203-084737-marostegui.json
- 08:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 08:45 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Events/2024' 'Event:Celebrate Women/Events/2024' Ammarpad # T416031
- 08:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T415786)', diff saved to https://phabricator.wikimedia.org/P88449 and previous config saved to /var/cache/conftool/dbconfig/20260203-084014-marostegui.json
- 08:38 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Events' 'Event:Celebrate Women/Events' Ammarpad # T416031
- 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
- 08:27 moritzm: failover irc.wikimedia.org to irc1003.wikimedia.org
- 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
- 08:25 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Create an article' 'Event:Celebrate Women/Create an article' Ammarpad # T416031
- 08:21 jmm@dns1004: END - running authdns-update
- 08:20 jmm@dns1004: START - running authdns-update
- 08:19 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Add citations' 'Event:Celebrate Women/Add citations' Ammarpad # T416031
- 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
- 08:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
- 08:12 Ammar: Ran refreshImageMetadata.php for multiple files for T414643
- 07:34 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women' 'Event:Celebrate Women' Ammarpad # T416031
- 07:24 moritzm: installing openssl security updates
- 07:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 06:55 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1163 (T415786)', diff saved to https://phabricator.wikimedia.org/P88448 and previous config saved to /var/cache/conftool/dbconfig/20260203-065541-marostegui.json
- 06:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 06:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2161.codfw.wmnet with reason: schema change
- 06:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 06:14 marostegui@dns1006: END - running authdns-update
- 06:13 marostegui@dns1006: START - running authdns-update
- 06:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2192 T415900', diff saved to https://phabricator.wikimedia.org/P88447 and previous config saved to /var/cache/conftool/dbconfig/20260203-061142-marostegui.json
- 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2213 to s5 primary and set section read-write T415900', diff saved to https://phabricator.wikimedia.org/P88446 and previous config saved to /var/cache/conftool/dbconfig/20260203-061025-marostegui.json
- 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'Set s5 codfw as read-only for maintenance - T415900', diff saved to https://phabricator.wikimedia.org/P88445 and previous config saved to /var/cache/conftool/dbconfig/20260203-061002-marostegui.json
- 06:04 marostegui: Starting s5 codfw failover from db2192 to db2213 - T415900
- 06:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s5 T415900
- 06:04 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2213 with weight 0 T415900', diff saved to https://phabricator.wikimedia.org/P88444 and previous config saved to /var/cache/conftool/dbconfig/20260203-060411-marostegui.json
- 06:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 06:00 marostegui@dns1006: END - running authdns-update
- 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2214 T415862', diff saved to https://phabricator.wikimedia.org/P88443 and previous config saved to /var/cache/conftool/dbconfig/20260203-060000-marostegui.json
- 05:59 marostegui@dns1006: START - running authdns-update
- 05:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2229 to s6 primary and set section read-write T415862', diff saved to https://phabricator.wikimedia.org/P88442 and previous config saved to /var/cache/conftool/dbconfig/20260203-055844-marostegui.json
- 05:58 marostegui@cumin1003: dbctl commit (dc=all): 'Set s6 codfw as read-only for maintenance - T415862', diff saved to https://phabricator.wikimedia.org/P88441 and previous config saved to /var/cache/conftool/dbconfig/20260203-055823-marostegui.json
- 05:51 marostegui: Starting s6 codfw failover from db2214 to db2229 - T415862
- 05:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s6 T415862
- 05:50 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2229 with weight 0 T415862', diff saved to https://phabricator.wikimedia.org/P88440 and previous config saved to /var/cache/conftool/dbconfig/20260203-055010-marostegui.json
- 05:02 mwpresync@deploy2002: Pruned MediaWiki: 1.46.0-wmf.11 (duration: 02m 53s)
- 04:48 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.46.0-wmf.14 refs T413805 (duration: 44m 29s)
- 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.46.0-wmf.14 refs T413805
- 03:07 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2239.codfw.wmnet with reason: Maintenance
- 03:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88439 and previous config saved to /var/cache/conftool/dbconfig/20260203-030644-marostegui.json
- 02:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P88438 and previous config saved to /var/cache/conftool/dbconfig/20260203-025135-marostegui.json
- 02:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P88437 and previous config saved to /var/cache/conftool/dbconfig/20260203-023627-marostegui.json
- 02:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88436 and previous config saved to /var/cache/conftool/dbconfig/20260203-022119-marostegui.json
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 39s)
- 02:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:15 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88435 and previous config saved to /var/cache/conftool/dbconfig/20260203-001511-marostegui.json
- 00:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2227.codfw.wmnet with reason: Maintenance
- 00:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T415786)', diff saved to https://phabricator.wikimedia.org/P88434 and previous config saved to /var/cache/conftool/dbconfig/20260203-001445-marostegui.json
- 00:04 robh: eqsin cp5022 troubleshooting onsite in progress
2026-02-02
- 23:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P88433 and previous config saved to /var/cache/conftool/dbconfig/20260202-235937-marostegui.json
- 23:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P88432 and previous config saved to /var/cache/conftool/dbconfig/20260202-234429-marostegui.json
- 23:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T415786)', diff saved to https://phabricator.wikimedia.org/P88431 and previous config saved to /var/cache/conftool/dbconfig/20260202-232921-marostegui.json
- 22:40 herron: added 500G to the lv on mwlog1002
- 22:24 inflatador: bking@apt1002 `sudo -E reprepro -C thirdparty/opensearch3 copy trixie-wikimedia bookworm-wikimedia opensearch`
- 22:19 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 22:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88430 and previous config saved to /var/cache/conftool/dbconfig/20260202-221912-marostegui.json
- 22:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P88429 and previous config saved to /var/cache/conftool/dbconfig/20260202-220404-marostegui.json
- 21:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P88428 and previous config saved to /var/cache/conftool/dbconfig/20260202-214855-marostegui.json
- 21:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88427 and previous config saved to /var/cache/conftool/dbconfig/20260202-213347-marostegui.json
- 21:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2209 (T415786)', diff saved to https://phabricator.wikimedia.org/P88426 and previous config saved to /var/cache/conftool/dbconfig/20260202-212703-marostegui.json
- 21:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 21:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88425 and previous config saved to /var/cache/conftool/dbconfig/20260202-212638-marostegui.json
- 21:16 kemayo@deploy2002: Finished scap sync-world: Backport for Edit check: turn off the tone a/b test on frwiki, jawiki, ptwiki (T411914), Enable suggestions BetaFeature on beta wikis (T415504), WikimediaCustomizations: Set WMCBadEmailDomainsFile (T397244), filebackend: Clean up removed config params for multi-write backends (T328872) (duration: 10
- 21:12 kemayo@deploy2002: tgr, func, kemayo, esanders: Continuing with sync
- 21:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P88424 and previous config saved to /var/cache/conftool/dbconfig/20260202-211129-marostegui.json
- 21:07 kemayo@deploy2002: tgr, func, kemayo, esanders: Backport for Edit check: turn off the tone a/b test on frwiki, jawiki, ptwiki (T411914), Enable suggestions BetaFeature on beta wikis (T415504), WikimediaCustomizations: Set WMCBadEmailDomainsFile (T397244), filebackend: Clean up removed config params for multi-write backends (T328872) synced to
- 21:05 kemayo@deploy2002: Started scap sync-world: Backport for Edit check: turn off the tone a/b test on frwiki, jawiki, ptwiki (T411914), Enable suggestions BetaFeature on beta wikis (T415504), WikimediaCustomizations: Set WMCBadEmailDomainsFile (T397244), filebackend: Clean up removed config params for multi-write backends (T328872)
- 20:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P88423 and previous config saved to /var/cache/conftool/dbconfig/20260202-205621-marostegui.json
- 20:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88422 and previous config saved to /var/cache/conftool/dbconfig/20260202-204113-marostegui.json
- 20:24 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88421 and previous config saved to /var/cache/conftool/dbconfig/20260202-202451-marostegui.json
- 20:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 6 hosts with reason: Maintenance
- 20:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 20:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T415786)', diff saved to https://phabricator.wikimedia.org/P88420 and previous config saved to /var/cache/conftool/dbconfig/20260202-202404-marostegui.json
- 20:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P88419 and previous config saved to /var/cache/conftool/dbconfig/20260202-200855-marostegui.json
- 19:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P88418 and previous config saved to /var/cache/conftool/dbconfig/20260202-195345-marostegui.json
- 19:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T415786)', diff saved to https://phabricator.wikimedia.org/P88417 and previous config saved to /var/cache/conftool/dbconfig/20260202-193837-marostegui.json
- 18:42 bvibber@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 18:42 bvibber@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 18:41 bvibber@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 18:41 bvibber@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 18:40 bvibber@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 18:40 bvibber@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 18:33 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88416 and previous config saved to /var/cache/conftool/dbconfig/20260202-183312-marostegui.json
- 18:33 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 18:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88415 and previous config saved to /var/cache/conftool/dbconfig/20260202-183248-marostegui.json
- 18:22 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1198 (T415786)', diff saved to https://phabricator.wikimedia.org/P88414 and previous config saved to /var/cache/conftool/dbconfig/20260202-182210-marostegui.json
- 18:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 18:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T415786)', diff saved to https://phabricator.wikimedia.org/P88413 and previous config saved to /var/cache/conftool/dbconfig/20260202-182144-marostegui.json
- 18:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P88412 and previous config saved to /var/cache/conftool/dbconfig/20260202-181739-marostegui.json
- 18:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P88411 and previous config saved to /var/cache/conftool/dbconfig/20260202-180633-marostegui.json
- 18:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P88410 and previous config saved to /var/cache/conftool/dbconfig/20260202-180230-marostegui.json
- 17:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P88409 and previous config saved to /var/cache/conftool/dbconfig/20260202-175125-marostegui.json
- 17:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88408 and previous config saved to /var/cache/conftool/dbconfig/20260202-174721-marostegui.json
- 17:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T415786)', diff saved to https://phabricator.wikimedia.org/P88407 and previous config saved to /var/cache/conftool/dbconfig/20260202-173616-marostegui.json
- 16:49 elukey@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: sync
- 16:48 elukey@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: sync
- 16:42 dancy@deploy2002: Installation of scap version "4.241.0" completed for 2 hosts
- 16:40 dancy@deploy2002: Installing scap version "4.241.0" for 2 host(s)
- 16:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1189 (T415786)', diff saved to https://phabricator.wikimedia.org/P88406 and previous config saved to /var/cache/conftool/dbconfig/20260202-162042-marostegui.json
- 16:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 16:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T415786)', diff saved to https://phabricator.wikimedia.org/P88405 and previous config saved to /var/cache/conftool/dbconfig/20260202-162017-marostegui.json
- 16:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260202-160504-marostegui.json
- 15:53 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 15:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P88403 and previous config saved to /var/cache/conftool/dbconfig/20260202-154956-marostegui.json
- 15:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88402 and previous config saved to /var/cache/conftool/dbconfig/20260202-154038-marostegui.json
- 15:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 15:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T415786)', diff saved to https://phabricator.wikimedia.org/P88401 and previous config saved to /var/cache/conftool/dbconfig/20260202-154013-marostegui.json
- 15:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T415786)', diff saved to https://phabricator.wikimedia.org/P88400 and previous config saved to /var/cache/conftool/dbconfig/20260202-153447-marostegui.json
- 15:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P88399 and previous config saved to /var/cache/conftool/dbconfig/20260202-152503-marostegui.json
- 15:19 moritzm: restarting Mailman on lists1004 to pick up openssl security updates
- 15:13 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:11 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:10 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P88398 and previous config saved to /var/cache/conftool/dbconfig/20260202-150955-marostegui.json
- 15:07 moritzm: restarting Exim on lists1004 to pick up openssl security updates
- 15:00 moritzm: restarting mailman-web on lists1004 to pick up openssl security updates
- 15:00 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:59 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:58 Lucas_WMDE: UTC afternoon backport+config window done
- 14:56 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable Wikibase GraphQL on beta wikidata (T415516) (duration: 10m 30s)
- 14:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T415786)', diff saved to https://phabricator.wikimedia.org/P88397 and previous config saved to /var/cache/conftool/dbconfig/20260202-145445-marostegui.json
- 14:52 lucaswerkmeister-wmde@deploy2002: jakob, lucaswerkmeister-wmde: Continuing with sync
- 14:51 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:48 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:47 lucaswerkmeister-wmde@deploy2002: jakob, lucaswerkmeister-wmde: Backport for Enable Wikibase GraphQL on beta wikidata (T415516) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:45 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable Wikibase GraphQL on beta wikidata (T415516)
- 14:44 arnoldokoth: restart vrts-daemon on vrts1003
- 14:39 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:36 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:36 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:35 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:35 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:27 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for zhwiki: Remove extra autoconfirmed limit for Tor user (T415335) (duration: 07m 51s)
- 14:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, stang: Continuing with sync
- 14:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, stang: Backport for zhwiki: Remove extra autoconfirmed limit for Tor user (T415335) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:19 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for zhwiki: Remove extra autoconfirmed limit for Tor user (T415335)
- 14:19 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1175 (T415786)', diff saved to https://phabricator.wikimedia.org/P88396 and previous config saved to /var/cache/conftool/dbconfig/20260202-141910-marostegui.json
- 14:19 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 14:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T415786)', diff saved to https://phabricator.wikimedia.org/P88395 and previous config saved to /var/cache/conftool/dbconfig/20260202-141844-marostegui.json
- 14:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Update ext-EventStreamConfig (T415638) (duration: 10m 45s)
- 14:13 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, joal: Continuing with sync
- 14:08 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, joal: Backport for Update ext-EventStreamConfig (T415638) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 14:06 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Update ext-EventStreamConfig (T415638)
- 14:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 14:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P88394 and previous config saved to /var/cache/conftool/dbconfig/20260202-140336-marostegui.json
- 14:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P88393 and previous config saved to /var/cache/conftool/dbconfig/20260202-134827-marostegui.json
- 13:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T415786)', diff saved to https://phabricator.wikimedia.org/P88392 and previous config saved to /var/cache/conftool/dbconfig/20260202-133319-marostegui.json
- 13:27 moritzm: installing Postgresql 15 security updates
- 13:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 13:17 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix bugs with no reason policy and haproxy actions - oblivian@cumin1003"
- 13:17 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bugs with no reason policy and haproxy actions - oblivian@cumin1003
- 13:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 13:16 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bugs with no reason policy and haproxy actions - oblivian@cumin1003
- 13:16 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix bugs with no reason policy and haproxy actions - oblivian@cumin1003"
- 12:55 moritzm: restarting Postfix on the MXes to pick up OpenSSL security updates
- 12:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1193: After schema change
- 12:54 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1193: After schema change
- 12:46 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.newpool (exit_code=99) pool db1193: After schema change
- 12:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1222: After schema change
- 12:37 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2177 (T415786)', diff saved to https://phabricator.wikimedia.org/P88389 and previous config saved to /var/cache/conftool/dbconfig/20260202-123726-marostegui.json
- 12:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 12:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88388 and previous config saved to /var/cache/conftool/dbconfig/20260202-123712-marostegui.json
- 12:37 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Samuel (WMF) out of all services on: 2487 hosts
- 12:33 moritzm: restarting nginx on puppetdb hosts
- 12:31 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest2006.codfw.wmnet
- 12:30 slyngshede@dns1004: END - running authdns-update
- 12:29 slyngshede@dns1004: START - running authdns-update
- 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
- 12:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P88385 and previous config saved to /var/cache/conftool/dbconfig/20260202-122203-marostegui.json
- 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1166 (T415786)', diff saved to https://phabricator.wikimedia.org/P88384 and previous config saved to /var/cache/conftool/dbconfig/20260202-121735-marostegui.json
- 12:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88383 and previous config saved to /var/cache/conftool/dbconfig/20260202-121707-marostegui.json
- 12:08 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
- 12:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P88380 and previous config saved to /var/cache/conftool/dbconfig/20260202-120654-marostegui.json
- 12:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P88379 and previous config saved to /var/cache/conftool/dbconfig/20260202-120157-marostegui.json
- 12:00 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1193: After schema change
- 12:00 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1222: After schema change
- 11:58 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.newpool (exit_code=99) pool db1222: After schema change
- 11:57 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1222: After schema change
- 11:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88376 and previous config saved to /var/cache/conftool/dbconfig/20260202-115142-marostegui.json
- 11:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P88375 and previous config saved to /var/cache/conftool/dbconfig/20260202-114648-marostegui.json
- 11:46 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AUgolnikova out of all services on: 2487 hosts
- 11:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88374 and previous config saved to /var/cache/conftool/dbconfig/20260202-113139-marostegui.json
- 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
- 11:14 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
- 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
- 11:06 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
- 10:45 moritzm: restarting Bitu on idm*
- 10:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2249: After reimage
- 10:20 dpogorzelski@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster ml-staging-codfw: Kubernetes upgrade
- 10:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88371 and previous config saved to /var/cache/conftool/dbconfig/20260202-101658-marostegui.json
- 10:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 09:51 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db2249: After reimage
- 09:50 dpogorzelski@cumin1003: END (FAIL) - Cookbook sre.k8s.wipe-cluster (exit_code=99) Wipe the K8s cluster ml-staging-codfw: Kubernetes upgrade
- 09:46 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2249.codfw.wmnet with OS trixie
- 09:45 ihurbain@deploy2002: Finished scap sync-world: Backport for Upgrading psy/psysh (v0.12.10 => v0.12.19) (T416050), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415328), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415888 T415328) (duration: 06m 36s)
- 09:40 ihurbain@deploy2002: reedy, cscott, ihurbain: Continuing with sync
- 09:40 ihurbain@deploy2002: reedy, cscott, ihurbain: Backport for Upgrading psy/psysh (v0.12.10 => v0.12.19) (T416050), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415328), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415888 T415328) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:39 dpogorzelski@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster ml-staging-codfw: Kubernetes upgrade
- 09:38 ihurbain@deploy2002: Started scap sync-world: Backport for Upgrading psy/psysh (v0.12.10 => v0.12.19) (T416050), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415328), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415888 T415328)
- 09:35 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) depool all services in codfw/ml-staging-codfw: maintenance
- 09:35 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster depool all services in codfw/ml-staging-codfw: maintenance
- 09:34 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88368 and previous config saved to /var/cache/conftool/dbconfig/20260202-093418-marostegui.json
- 09:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 09:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T415786)', diff saved to https://phabricator.wikimedia.org/P88367 and previous config saved to /var/cache/conftool/dbconfig/20260202-093354-marostegui.json
- 09:33 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) depool all services in codfw/ml-staging-codfw: maintenance
- 09:33 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster depool all services in codfw/ml-staging-codfw: maintenance
- 09:27 elukey: cleanup nginx-related packages and configs from urldownloader hosts to clean up alerts - T405631
- 09:24 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2249.codfw.wmnet with reason: host reimage
- 09:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P88366 and previous config saved to /var/cache/conftool/dbconfig/20260202-091845-marostegui.json
- 09:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2249.codfw.wmnet with reason: host reimage
- 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2249.codfw.wmnet with OS trixie
- 09:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2249.codfw.wmnet with reason: Reimage to debian trixie
- 09:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P88365 and previous config saved to /var/cache/conftool/dbconfig/20260202-090337-marostegui.json
- 09:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2249 T415358', diff saved to https://phabricator.wikimedia.org/P88364 and previous config saved to /var/cache/conftool/dbconfig/20260202-090328-marostegui.json
- 08:56 kharlan@deploy2002: Finished scap sync-world: Backport for BlockUtils: Log x-provenance and IP reputation fields (T415354) (duration: 10m 05s)
- 08:50 kharlan@deploy2002: kharlan: Continuing with sync
- 08:48 kharlan@deploy2002: kharlan: Backport for BlockUtils: Log x-provenance and IP reputation fields (T415354) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T415786)', diff saved to https://phabricator.wikimedia.org/P88363 and previous config saved to /var/cache/conftool/dbconfig/20260202-084806-marostegui.json
- 08:46 kharlan@deploy2002: Started scap sync-world: Backport for BlockUtils: Log x-provenance and IP reputation fields (T415354)
- 08:45 kharlan@deploy2002: Finished scap sync-world: Backport for Enable watchlist labels everywhere (prod and beta) (T413967) (duration: 41m 47s)
- 08:31 kharlan@deploy2002: kharlan, samwilson: Continuing with sync
- 08:27 kharlan@deploy2002: kharlan, samwilson: Backport for Enable watchlist labels everywhere (prod and beta) (T413967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:12 moritzm: installing openssl security updates
- 08:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 08:04 kharlan@deploy2002: Started scap sync-world: Backport for Enable watchlist labels everywhere (prod and beta) (T413967)
- 08:02 joal: Restarting druid middle-managers to recover from OOM - T415799
- 06:33 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2149 (T415786)', diff saved to https://phabricator.wikimedia.org/P88361 and previous config saved to /var/cache/conftool/dbconfig/20260202-063304-marostegui.json
- 06:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 06:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1222 T415983', diff saved to https://phabricator.wikimedia.org/P88360 and previous config saved to /var/cache/conftool/dbconfig/20260202-062554-marostegui.json
- 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1162 to s2 primary T415983', diff saved to https://phabricator.wikimedia.org/P88359 and previous config saved to /var/cache/conftool/dbconfig/20260202-062522-marostegui.json
- 06:23 marostegui: Starting s2 eqiad failover from db1222 to db1162 - T415983
- 06:22 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1162 with weight 0 T415983', diff saved to https://phabricator.wikimedia.org/P88358 and previous config saved to /var/cache/conftool/dbconfig/20260202-062212-marostegui.json
- 06:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T415983
- 06:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2161.codfw.wmnet with reason: long schema change
- 06:13 marostegui@dns1006: END - running authdns-update
- 06:13 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2161 T415748', diff saved to https://phabricator.wikimedia.org/P88357 and previous config saved to /var/cache/conftool/dbconfig/20260202-061310-marostegui.json
- 06:12 marostegui@dns1006: START - running authdns-update
- 06:12 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2165 to s8 primary and set section read-write T415748', diff saved to https://phabricator.wikimedia.org/P88356 and previous config saved to /var/cache/conftool/dbconfig/20260202-061217-marostegui.json
- 06:11 marostegui@cumin1003: dbctl commit (dc=all): 'Set s8 codfw as read-only for maintenance - T415748', diff saved to https://phabricator.wikimedia.org/P88355 and previous config saved to /var/cache/conftool/dbconfig/20260202-061150-marostegui.json
- 06:11 marostegui: Starting s8 codfw failover from db2161 to db2165 - T415748
- 06:04 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2165 with weight 0 T415748', diff saved to https://phabricator.wikimedia.org/P88354 and previous config saved to /var/cache/conftool/dbconfig/20260202-060437-marostegui.json
- 06:02 marostegui: Deploy schema change on old s8 eqiad master db1193 T411164 T411163
- 05:59 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1193.eqiad.wmnet with reason: long schema change
- 05:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1193 T416107', diff saved to https://phabricator.wikimedia.org/P88353 and previous config saved to /var/cache/conftool/dbconfig/20260202-055755-marostegui.json
- 05:57 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1209 to s8 primary T416107', diff saved to https://phabricator.wikimedia.org/P88352 and previous config saved to /var/cache/conftool/dbconfig/20260202-055717-marostegui.json
- 05:56 marostegui: Starting s8 eqiad failover from db1193 to db1209 - T416107
- 05:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T416107
- 05:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1209 with weight 0 T416107', diff saved to https://phabricator.wikimedia.org/P88351 and previous config saved to /var/cache/conftool/dbconfig/20260202-055304-marostegui.json
- 02:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 22s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-01
- 02:01 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 14s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image