Server Admin Log
Appearance
2025-06-14
- 04:09 ryankemper: [WDQS] Restarted blazegraph on `wdqs2009`. Probedown already resolved before the restart so this might be necessary but restarting just in case
- 00:09 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
- 00:08 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
2025-06-13
- 23:58 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 23:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 23:31 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 23:28 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 23:27 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 23:22 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
- 23:16 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 23:15 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
- 22:26 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
- 22:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 22:25 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:24 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:23 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
- 22:19 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:18 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 22:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:12 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
- 22:11 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:10 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:08 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:06 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:04 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 22:03 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:03 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
- 22:03 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
- 22:03 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:02 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:02 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:00 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1185
- 22:00 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1185
- 22:00 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:00 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 21:59 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:59 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
- 21:59 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
- 21:59 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:53 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 21:52 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:51 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:49 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:49 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:48 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:48 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:40 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:40 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:30 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:00 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1019.eqiad.wmnet with OS bullseye
- 20:43 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage
- 20:40 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage
- 20:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 20:24 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1019.eqiad.wmnet with OS bullseye
- 20:24 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 20:19 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1019.eqiad.wmnet with OS bullseye
- 20:13 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 20:03 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage
- 20:03 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 20:00 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1019.eqiad.wmnet with reason: host reimage
- 19:44 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1019.eqiad.wmnet with OS bullseye
- 19:41 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1019.eqiad.wmnet']
- 19:35 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1019.eqiad.wmnet']
- 19:35 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1019.eqiad.wmnet
- 19:35 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcephosd1019.eqiad.wmnet
- 19:24 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1019.eqiad.wmnet
- 19:23 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1019.eqiad.wmnet
- 19:21 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cloudcephosd1019.eqiad.wmnet
- 19:14 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1019.eqiad.wmnet
- 17:54 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:44 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:40 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:29 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:28 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:26 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:26 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:26 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:25 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:25 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:24 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:24 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:22 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:21 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:21 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:21 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:21 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:21 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:20 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:20 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:18 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:17 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:16 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:16 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:14 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:12 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:12 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 16:49 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1157.eqiad.wmnet
- 16:45 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1159.eqiad.wmnet
- 16:42 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1159.eqiad.wmnet
- 16:42 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1158.eqiad.wmnet
- 16:40 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1158.eqiad.wmnet
- 16:38 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1157.eqiad.wmnet
- 16:36 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1157.eqiad.wmnet
- 16:35 dancy@deploy1003: Installation of scap version "4.174.0" completed for 2 hosts
- 16:33 dancy@deploy1003: Installing scap version "4.174.0" for 2 host(s)
- 16:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:13 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:10 brennen@deploy1003: Finished scap sync-world: Backport for Revert "group1 to 1.45.0-wmf.5" (T392175 T396790) (duration: 14m 56s)
- 16:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:01 brennen@deploy1003: brennen: Continuing with sync
- 16:00 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 15:59 brennen@deploy1003: brennen: Backport for Revert "group1 to 1.45.0-wmf.5" (T392175 T396790) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:57 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 15:55 brennen@deploy1003: Started scap sync-world: Backport for Revert "group1 to 1.45.0-wmf.5" (T392175 T396790)
- 15:55 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:54 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:50 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 15:49 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 14:54 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:53 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T396130)', diff saved to https://phabricator.wikimedia.org/P77952 and previous config saved to /var/cache/conftool/dbconfig/20250613-143859-marostegui.json
- 14:25 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@cab8d81]: hotfix-bump SEAL to v0.9.0 (duration: 02m 26s)
- 14:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P77951 and previous config saved to /var/cache/conftool/dbconfig/20250613-142351-marostegui.json
- 14:23 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@cab8d81]: hotfix-bump SEAL to v0.9.0
- 14:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P77950 and previous config saved to /var/cache/conftool/dbconfig/20250613-140844-marostegui.json
- 13:57 damilare: SmashPig upgraded from 84c0668b to 4eef974d
- 13:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T396130)', diff saved to https://phabricator.wikimedia.org/P77949 and previous config saved to /var/cache/conftool/dbconfig/20250613-135336-marostegui.json
- 13:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T396130)', diff saved to https://phabricator.wikimedia.org/P77948 and previous config saved to /var/cache/conftool/dbconfig/20250613-133900-marostegui.json
- 13:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2195.codfw.wmnet with reason: Maintenance
- 13:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T396130)', diff saved to https://phabricator.wikimedia.org/P77947 and previous config saved to /var/cache/conftool/dbconfig/20250613-133837-marostegui.json
- 13:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P77944 and previous config saved to /var/cache/conftool/dbconfig/20250613-132329-marostegui.json
- 13:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P77942 and previous config saved to /var/cache/conftool/dbconfig/20250613-130822-marostegui.json
- 13:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1018.eqiad.wmnet with OS bullseye
- 12:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T396130)', diff saved to https://phabricator.wikimedia.org/P77941 and previous config saved to /var/cache/conftool/dbconfig/20250613-125314-marostegui.json
- 12:48 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: host reimage
- 12:44 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1018.eqiad.wmnet with reason: host reimage
- 12:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77940 and previous config saved to /var/cache/conftool/dbconfig/20250613-123955-root.json
- 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T396130)', diff saved to https://phabricator.wikimedia.org/P77939 and previous config saved to /var/cache/conftool/dbconfig/20250613-123635-marostegui.json
- 12:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2181.codfw.wmnet with reason: Maintenance
- 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T396130)', diff saved to https://phabricator.wikimedia.org/P77938 and previous config saved to /var/cache/conftool/dbconfig/20250613-123612-marostegui.json
- 12:28 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1018.eqiad.wmnet with OS bullseye
- 12:27 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1018.eqiad.wmnet with OS bullseye
- 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77937 and previous config saved to /var/cache/conftool/dbconfig/20250613-122449-root.json
- 12:21 akosiaris: T390251 re-enable puppet on all registries.
- 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P77936 and previous config saved to /var/cache/conftool/dbconfig/20250613-122104-marostegui.json
- 12:17 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1018.eqiad.wmnet with OS bullseye
- 12:15 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1018.eqiad.wmnet
- 12:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1018.eqiad.wmnet
- 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77935 and previous config saved to /var/cache/conftool/dbconfig/20250613-120944-root.json
- 12:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P77934 and previous config saved to /var/cache/conftool/dbconfig/20250613-120557-marostegui.json
- 12:05 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1018.eqiad.wmnet
- 12:02 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir7004.magru.wmnet with OS bookworm
- 11:55 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1018.eqiad.wmnet
- 11:55 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1018.eqiad.wmnet']
- 11:54 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1018.eqiad.wmnet']
- 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77933 and previous config saved to /var/cache/conftool/dbconfig/20250613-115438-root.json
- 11:54 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1018.eqiad.wmnet']
- 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T396130)', diff saved to https://phabricator.wikimedia.org/P77932 and previous config saved to /var/cache/conftool/dbconfig/20250613-115049-marostegui.json
- 11:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P77931 and previous config saved to /var/cache/conftool/dbconfig/20250613-114917-marostegui.json
- 11:47 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1018.eqiad.wmnet']
- 11:46 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir7004.magru.wmnet with reason: host reimage
- 11:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:43 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir7004.magru.wmnet with reason: host reimage
- 11:41 akosiaris: T390251 re-enable puppet on registry1004 after merging puppet refactoring changes.
- 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T396130)', diff saved to https://phabricator.wikimedia.org/P77930 and previous config saved to /var/cache/conftool/dbconfig/20250613-113402-marostegui.json
- 11:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77929 and previous config saved to /var/cache/conftool/dbconfig/20250613-113339-marostegui.json
- 11:22 marostegui@cumin1002: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P77928 and previous config saved to /var/cache/conftool/dbconfig/20250613-111832-marostegui.json
- 11:14 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7004.magru.wmnet with OS bookworm
- 11:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P77927 and previous config saved to /var/cache/conftool/dbconfig/20250613-110324-marostegui.json
- 10:48 root@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-backup1002.eqiad.wmnet: Renew puppet certificate - root@cumin1002
- 10:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77926 and previous config saved to /var/cache/conftool/dbconfig/20250613-104816-marostegui.json
- 10:45 root@cumin1002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for ms-backup1001.eqiad.wmnet: Renew puppet certificate - root@cumin1002
- 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77925 and previous config saved to /var/cache/conftool/dbconfig/20250613-103137-marostegui.json
- 10:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77924 and previous config saved to /var/cache/conftool/dbconfig/20250613-103114-marostegui.json
- 10:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on db2212.codfw.wmnet with reason: Not powering up
- 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P77923 and previous config saved to /var/cache/conftool/dbconfig/20250613-101607-marostegui.json
- 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77922 and previous config saved to /var/cache/conftool/dbconfig/20250613-100754-root.json
- 10:05 taavi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:05 taavi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add codfw1dev auth v6 VIPs - taavi@cumin1003"
- 10:05 taavi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add codfw1dev auth v6 VIPs - taavi@cumin1003"
- 10:02 taavi@cumin1003: START - Cookbook sre.dns.netbox
- 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P77921 and previous config saved to /var/cache/conftool/dbconfig/20250613-100059-marostegui.json
- 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77920 and previous config saved to /var/cache/conftool/dbconfig/20250613-095248-root.json
- 09:47 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-backup1001.eqiad.wmnet with reason: Maintenance and reboot
- 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77919 and previous config saved to /var/cache/conftool/dbconfig/20250613-094552-marostegui.json
- 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77918 and previous config saved to /var/cache/conftool/dbconfig/20250613-093742-root.json
- 09:35 jmm@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on install7001.wikimedia.org with reason: being replaced by install7002
- 09:35 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 09:35 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 09:35 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 09:34 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 09:34 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 09:34 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 09:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77917 and previous config saved to /var/cache/conftool/dbconfig/20250613-092910-marostegui.json
- 09:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2165.codfw.wmnet with reason: Maintenance
- 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T396130)', diff saved to https://phabricator.wikimedia.org/P77916 and previous config saved to /var/cache/conftool/dbconfig/20250613-092847-marostegui.json
- 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77915 and previous config saved to /var/cache/conftool/dbconfig/20250613-092236-root.json
- 09:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2148', diff saved to https://phabricator.wikimedia.org/P77914 and previous config saved to /var/cache/conftool/dbconfig/20250613-091800-marostegui.json
- 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P77913 and previous config saved to /var/cache/conftool/dbconfig/20250613-091339-marostegui.json
- 09:12 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ms-backup1002.eqiad.wmnet with reason: Maintenance and reboot
- 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P77912 and previous config saved to /var/cache/conftool/dbconfig/20250613-085832-marostegui.json
- 08:56 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 08:54 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 08:53 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 08:49 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 08:49 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7004.magru.wmnet
- 08:49 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir7004.magru.wmnet with OS bookworm
- 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T396130)', diff saved to https://phabricator.wikimedia.org/P77911 and previous config saved to /var/cache/conftool/dbconfig/20250613-084325-marostegui.json
- 08:35 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 08:32 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T396130)', diff saved to https://phabricator.wikimedia.org/P77910 and previous config saved to /var/cache/conftool/dbconfig/20250613-082656-marostegui.json
- 08:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T396130)', diff saved to https://phabricator.wikimedia.org/P77909 and previous config saved to /var/cache/conftool/dbconfig/20250613-082633-marostegui.json
- 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P77908 and previous config saved to /var/cache/conftool/dbconfig/20250613-081126-marostegui.json
- 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P77905 and previous config saved to /var/cache/conftool/dbconfig/20250613-075618-marostegui.json
- 07:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77904 and previous config saved to /var/cache/conftool/dbconfig/20250613-074450-root.json
- 07:42 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7004.magru.wmnet with OS bookworm
- 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T396130)', diff saved to https://phabricator.wikimedia.org/P77903 and previous config saved to /var/cache/conftool/dbconfig/20250613-074110-marostegui.json
- 07:35 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 07:35 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 07:35 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
- 07:35 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
- 07:34 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:34 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 07:33 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77901 and previous config saved to /var/cache/conftool/dbconfig/20250613-072944-root.json
- 07:26 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T396130)', diff saved to https://phabricator.wikimedia.org/P77900 and previous config saved to /var/cache/conftool/dbconfig/20250613-072431-marostegui.json
- 07:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T396130)', diff saved to https://phabricator.wikimedia.org/P77899 and previous config saved to /var/cache/conftool/dbconfig/20250613-072408-marostegui.json
- 07:18 jmm@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 07:16 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 07:16 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
- 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77898 and previous config saved to /var/cache/conftool/dbconfig/20250613-071438-root.json
- 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P77897 and previous config saved to /var/cache/conftool/dbconfig/20250613-070901-marostegui.json
- 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77896 and previous config saved to /var/cache/conftool/dbconfig/20250613-065933-root.json
- 06:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P77895 and previous config saved to /var/cache/conftool/dbconfig/20250613-065353-marostegui.json
- 06:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2175', diff saved to https://phabricator.wikimedia.org/P77894 and previous config saved to /var/cache/conftool/dbconfig/20250613-065239-marostegui.json
- 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T396130)', diff saved to https://phabricator.wikimedia.org/P77893 and previous config saved to /var/cache/conftool/dbconfig/20250613-063845-marostegui.json
- 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77892 and previous config saved to /var/cache/conftool/dbconfig/20250613-063435-root.json
- 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T396130)', diff saved to https://phabricator.wikimedia.org/P77891 and previous config saved to /var/cache/conftool/dbconfig/20250613-062203-marostegui.json
- 06:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 06:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T396130)', diff saved to https://phabricator.wikimedia.org/P77890 and previous config saved to /var/cache/conftool/dbconfig/20250613-062140-marostegui.json
- 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77889 and previous config saved to /var/cache/conftool/dbconfig/20250613-061930-root.json
- 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P77888 and previous config saved to /var/cache/conftool/dbconfig/20250613-060633-marostegui.json
- 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77887 and previous config saved to /var/cache/conftool/dbconfig/20250613-060424-root.json
- 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P77885 and previous config saved to /var/cache/conftool/dbconfig/20250613-055125-marostegui.json
- 05:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2189 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77884 and previous config saved to /var/cache/conftool/dbconfig/20250613-054918-root.json
- 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2189', diff saved to https://phabricator.wikimedia.org/P77883 and previous config saved to /var/cache/conftool/dbconfig/20250613-054156-marostegui.json
- 05:40 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T396130)', diff saved to https://phabricator.wikimedia.org/P77882 and previous config saved to /var/cache/conftool/dbconfig/20250613-053617-marostegui.json
- 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T396130)', diff saved to https://phabricator.wikimedia.org/P77881 and previous config saved to /var/cache/conftool/dbconfig/20250613-052114-marostegui.json
- 05:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 05:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb1013.eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 05:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 03:35 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1017.eqiad.wmnet with OS bullseye
- 03:18 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: host reimage
- 03:15 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1017.eqiad.wmnet with reason: host reimage
- 02:59 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1017.eqiad.wmnet with OS bullseye
- 02:58 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
- 02:57 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
- 02:57 andrew@cumin1002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 02:57 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 02:57 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
- 02:41 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
- 02:40 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
- 02:33 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1017.eqiad.wmnet']
- 02:31 eileen: postinstall
- 01:55 eileen: * postinstall
- 01:17 ladsgroup@cumin2002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1251 gradually with 4 steps - Firmware update done
- 01:14 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-ulsfo and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
- 00:31 ladsgroup@cumin2002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1253 gradually with 4 steps - Firmware updated
- 00:29 ladsgroup@cumin2002: START - Cookbook sre.mysql.pool db1251 gradually with 4 steps - Firmware update done
- 00:13 ladsgroup@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1251.eqiad.wmnet with reason: Firmware update
- 00:13 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1251.eqiad.wmnet
- 00:08 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1251.eqiad.wmnet
2025-06-12
- 23:54 ladsgroup@cumin2002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1254 gradually with 4 steps - Firmware update done
- 23:53 ladsgroup@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1251.eqiad.wmnet
- 23:53 ladsgroup@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1251.eqiad.wmnet
- 23:48 ladsgroup@cumin2002: dbctl commit (dc=all): 'Depool db1251 for firmware update (T396648)', diff saved to https://phabricator.wikimedia.org/P77872 and previous config saved to /var/cache/conftool/dbconfig/20250612-234855-ladsgroup.json
- 23:47 ladsgroup@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1251.eqiad.wmnet with reason: Firmware update
- 23:43 ladsgroup@cumin2002: START - Cookbook sre.mysql.pool db1253 gradually with 4 steps - Firmware updated
- 23:43 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 23:37 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts db1252.eqiad.wmnet
- 23:36 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1252.eqiad.wmnet
- 23:36 ladsgroup@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 23:21 ladsgroup@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1252.eqiad.wmnet
- 23:20 ladsgroup@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1252.eqiad.wmnet
- 23:14 bvibber@deploy1003: Finished scap sync-world: Backport for Specify Lua transform arguments on , invocations (T395610)Specify Lua transform arguments on (duration: 61m 18s) invocations (T395610)
- 23:13 ladsgroup@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1252.eqiad.wmnet with reason: Firmware update
- 23:06 ladsgroup@cumin2002: START - Cookbook sre.mysql.pool db1254 gradually with 4 steps - Firmware update done
- 23:01 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts db1254.eqiad.wmnet
- 23:00 ladsgroup@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host db1254.eqiad.wmnet
- 23:00 bvibber@deploy1003: bvibber: Continuing with sync
- 22:59 bvibber@deploy1003: bvibber: Backport for Specify Lua transform arguments on , invocations (T395610)Specify Lua transform arguments on synced to the testservers (see invocations (T395610)https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:58 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS bullseye
- 22:45 ladsgroup@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1254.eqiad.wmnet
- 22:44 ladsgroup@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1254.eqiad.wmnet
- 22:43 ladsgroup@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1254.eqiad.wmnet with reason: Firmware upgrade (T396648)
- 22:43 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1254.eqiad.wmnet
- 22:42 ladsgroup@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1254.eqiad.wmnet
- 22:39 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1254.eqiad.wmnet with reason: Firmware upgrade (T396648)
- 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1254 (T396648)', diff saved to https://phabricator.wikimedia.org/P77867 and previous config saved to /var/cache/conftool/dbconfig/20250612-223834-ladsgroup.json
- 22:13 bvibber@deploy1003: Started scap sync-world: Backport for Specify Lua transform arguments on , invocations (T395610)Specify Lua transform arguments on invocations (T395610)
- 22:07 maryum: Deploy security fix for T394863
- 22:00 maryum: Deployed security fix for T396413
- 21:54 maryum: Deploy security fix for T396524
- 21:51 cstone: civicrm upgraded from 870eed23 to f2f33db5
- 21:43 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
- 21:39 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
- 21:24 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS bullseye
- 21:18 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1016.eqiad.wmnet with OS bullseye
- 21:15 cstone: SmashPig upgraded from 042d5a5b to 84c0668b
- 21:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: host reimage
- 20:58 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1016.eqiad.wmnet with reason: host reimage
- 20:48 bvibber@deploy1003: Finished scap sync-world: Backport for Fix for multiple charts on same page using mix of transforms (T396512), Fix for multiple charts on same page using mix of transforms (T396512) (duration: 09m 50s)
- 20:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1016.eqiad.wmnet with OS bullseye
- 20:41 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1016.eqiad.wmnet']
- 20:41 bvibber@deploy1003: bvibber: Continuing with sync
- 20:40 bvibber@deploy1003: bvibber: Backport for Fix for multiple charts on same page using mix of transforms (T396512), Fix for multiple charts on same page using mix of transforms (T396512) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:38 bvibber@deploy1003: Started scap sync-world: Backport for Fix for multiple charts on same page using mix of transforms (T396512), Fix for multiple charts on same page using mix of transforms (T396512)
- 20:36 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1016.eqiad.wmnet']
- 20:35 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1016.eqiad.wmnet']
- 20:33 dancy@deploy1003: Finished scap sync-world: Backport for enwiki: temporary lift of IP cap for event on 16 June 2025 (T396128) (duration: 09m 54s)
- 20:26 dancy@deploy1003: dancy, anzx: Continuing with sync
- 20:25 dancy@deploy1003: dancy, anzx: Backport for enwiki: temporary lift of IP cap for event on 16 June 2025 (T396128) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:23 dancy@deploy1003: Started scap sync-world: Backport for enwiki: temporary lift of IP cap for event on 16 June 2025 (T396128)
- 20:18 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1016.eqiad.wmnet']
- 20:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 19:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1015.eqiad.wmnet with reason: host reimage
- 19:47 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1015.eqiad.wmnet with reason: host reimage
- 19:31 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 19:27 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
- 19:19 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
- 19:18 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
- 19:18 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 19:18 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
- 19:18 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
- 19:02 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 19:02 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
- 18:58 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:57 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:54 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:53 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:41 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1048.eqiad.wmnet
- 18:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T395241)', diff saved to https://phabricator.wikimedia.org/P77866 and previous config saved to /var/cache/conftool/dbconfig/20250612-184028-fceratto.json
- 18:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 18:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T395241)', diff saved to https://phabricator.wikimedia.org/P77865 and previous config saved to /var/cache/conftool/dbconfig/20250612-183749-fceratto.json
- 18:37 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.5 refs T392175
- 18:25 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 18:24 brennen@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.5 refs T392175
- 18:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P77864 and previous config saved to /var/cache/conftool/dbconfig/20250612-182241-fceratto.json
- 18:10 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 18:10 brennen@deploy1003: Finished scap sync-world: Backport for ParserOutput::collectMetadata: Cast array keys to string (T396656) (duration: 10m 51s)
- 18:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P77863 and previous config saved to /var/cache/conftool/dbconfig/20250612-180733-fceratto.json
- 18:06 jasmine@dns1004: END - running authdns-update
- 18:05 jasmine@dns1004: START - running authdns-update
- 18:04 jasmine@dns1004: START - running authdns-update
- 18:03 brennen@deploy1003: brennen: Continuing with sync
- 18:01 brennen@deploy1003: brennen: Backport for ParserOutput::collectMetadata: Cast array keys to string (T396656) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:59 brennen@deploy1003: Started scap sync-world: Backport for ParserOutput::collectMetadata: Cast array keys to string (T396656)
- 17:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T395241)', diff saved to https://phabricator.wikimedia.org/P77862 and previous config saved to /var/cache/conftool/dbconfig/20250612-175226-fceratto.json
- 17:50 jasmine@deploy1003: Finished scap sync-world: Deploying apache2 configuration change for T393803 (duration: 20m 58s)
- 17:44 jasmine@deploy1003: jasmine: Continuing with sync
- 17:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T395241)', diff saved to https://phabricator.wikimedia.org/P77861 and previous config saved to /var/cache/conftool/dbconfig/20250612-173909-fceratto.json
- 17:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 17:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T395241)', diff saved to https://phabricator.wikimedia.org/P77860 and previous config saved to /var/cache/conftool/dbconfig/20250612-173843-fceratto.json
- 17:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 17:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 17:36 jasmine@deploy1003: jasmine: Deploying apache2 configuration change for T393803 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:35 jasmine@deploy1003: Started scap sync-world: Deploying apache2 configuration change for T393803
- 17:35 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-ulsfo and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
- 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2006
- 17:25 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2006
- 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2006 to codfw - jhancock@cumin2002"
- 17:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2006 to codfw - jhancock@cumin2002"
- 17:24 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:23 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P77859 and previous config saved to /var/cache/conftool/dbconfig/20250612-172335-fceratto.json
- 17:22 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:22 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:22 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:21 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:21 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 17:18 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudcephosd1015.eqiad.wmnet
- 17:13 cmooney@cumin1003: START - Cookbook sre.hosts.dhcp for host cloudcephosd1015.eqiad.wmnet
- 17:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P77857 and previous config saved to /var/cache/conftool/dbconfig/20250612-170828-fceratto.json
- 16:56 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1015.eqiad.wmnet with OS bookworm
- 16:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T395241)', diff saved to https://phabricator.wikimedia.org/P77856 and previous config saved to /var/cache/conftool/dbconfig/20250612-165320-fceratto.json
- 16:49 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:49 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS bullseye
- 16:48 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bookworm
- 16:43 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T395241)', diff saved to https://phabricator.wikimedia.org/P77854 and previous config saved to /var/cache/conftool/dbconfig/20250612-163536-fceratto.json
- 16:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77853 and previous config saved to /var/cache/conftool/dbconfig/20250612-163509-fceratto.json
- 16:35 volans@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db2241.mgmt.codfw.wmnet db2242.mgmt.codfw.wmnet on all recursors
- 16:35 volans@cumin1003: START - Cookbook sre.dns.wipe-cache db2241.mgmt.codfw.wmnet db2242.mgmt.codfw.wmnet on all recursors
- 16:31 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
- 16:31 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:30 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:30 volans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:30 volans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Invert db2241 and db2242 DNS T379757#10908710 - volans@cumin1003"
- 16:30 volans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Invert db2241 and db2242 DNS T379757#10908710 - volans@cumin1003"
- 16:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:26 volans@cumin1003: START - Cookbook sre.dns.netbox
- 16:26 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 16:25 eevans@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
- 16:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P77852 and previous config saved to /var/cache/conftool/dbconfig/20250612-162002-fceratto.json
- 16:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 16:18 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 16:14 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 16:10 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 16:10 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS bullseye
- 16:05 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 16:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P77851 and previous config saved to /var/cache/conftool/dbconfig/20250612-160454-fceratto.json
- 15:58 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
- 15:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77850 and previous config saved to /var/cache/conftool/dbconfig/20250612-154947-fceratto.json
- 15:49 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
- 15:49 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host db1253.eqiad.wmnet
- 15:45 swfrench-wmf: removed python3-conftool-dbctl package from puppetmaster[12]001 - T395696
- 15:44 logmsgbot: lucaswerkmeister-wmde Deployed security patch for T396685
- 15:40 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
- 15:40 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 15:37 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:35 logmsgbot: lucaswerkmeister-wmde Deployed security patch for T396685
- 15:34 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1253.eqiad.wmnet
- 15:32 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
- 15:32 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 15:31 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
- 15:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77848 and previous config saved to /var/cache/conftool/dbconfig/20250612-153008-fceratto.json
- 15:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T395241)', diff saved to https://phabricator.wikimedia.org/P77847 and previous config saved to /var/cache/conftool/dbconfig/20250612-152942-fceratto.json
- 15:29 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
- 15:29 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
- 15:29 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
- 15:28 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
- 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:25 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
- 15:25 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
- 15:25 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:25 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2057
- 15:25 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2057
- 15:24 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
- 15:24 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 15:23 urbanecm@deploy1003: Finished scap sync-world: Backport for LinkRecommendationStore: Query templatelinks on the main DB (T396680) (duration: 18m 06s)
- 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2057
- 15:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 15:19 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 15:16 urbanecm@deploy1003: urbanecm: Continuing with sync
- 15:15 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
- 15:14 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
- 15:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P77846 and previous config saved to /var/cache/conftool/dbconfig/20250612-151434-fceratto.json
- 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2057
- 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2056
- 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2056
- 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2055
- 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2055
- 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2054
- 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2054
- 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2053
- 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2053
- 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2052
- 15:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2052
- 15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2051
- 15:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2051
- 15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2050
- 15:12 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1015.eqiad.wmnet with OS bullseye
- 15:12 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1048.eqiad.wmnet
- 15:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2050
- 15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2049
- 15:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2049
- 15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2048
- 15:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2048
- 15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2047
- 15:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2047
- 15:11 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2046
- 15:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2046
- 15:11 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2045
- 15:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2045
- 15:11 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:11 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2045-57 to codfw - jhancock@cumin2002"
- 15:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2045-57 to codfw - jhancock@cumin2002"
- 15:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 15:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 15:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 15:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 15:09 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs7002.magru.wmnet} and A:liberica
- 15:09 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 15:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 15:08 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 15:08 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 15:08 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs7002.magru.wmnet} and A:liberica
- 15:08 vgutierrez: re-pooling lvs7002 using katran - T396561
- 15:08 urbanecm@deploy1003: urbanecm: Backport for LinkRecommendationStore: Query templatelinks on the main DB (T396680) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:07 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:05 urbanecm@deploy1003: Started scap sync-world: Backport for LinkRecommendationStore: Query templatelinks on the main DB (T396680)
- 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 15:03 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 15:01 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
- 14:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P77845 and previous config saved to /var/cache/conftool/dbconfig/20250612-145927-fceratto.json
- 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update for codfw - jhancock@cumin2002"
- 14:58 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
- 14:58 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
- 14:58 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 14:58 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 14:57 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 14:57 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 14:57 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 14:57 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 14:57 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 14:57 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 14:57 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 14:57 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
- 14:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update for codfw - jhancock@cumin2002"
- 14:54 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1015.eqiad.wmnet']
- 14:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:49 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1014.eqiad.wmnet with OS bullseye
- 14:48 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7002.magru.wmnet
- 14:48 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7002.magru.wmnet
- 14:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T395241)', diff saved to https://phabricator.wikimedia.org/P77843 and previous config saved to /var/cache/conftool/dbconfig/20250612-144419-fceratto.json
- 14:38 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:37 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:37 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:37 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:37 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:36 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:36 aqu@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:35 aqu@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 14:30 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@cb6b18b]: hotfix-bump SEAL to v0.8.0 (duration: 02m 24s)
- 14:28 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@cb6b18b]: hotfix-bump SEAL to v0.8.0
- 14:28 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2033.codfw.wmnet
- 14:28 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2033.codfw.wmnet
- 14:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T395241)', diff saved to https://phabricator.wikimedia.org/P77842 and previous config saved to /var/cache/conftool/dbconfig/20250612-142738-fceratto.json
- 14:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 14:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77841 and previous config saved to /var/cache/conftool/dbconfig/20250612-142712-fceratto.json
- 14:24 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1014.eqiad.wmnet with reason: host reimage
- 14:21 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1014.eqiad.wmnet with reason: host reimage
- 14:20 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2033.codfw.wmnet
- 14:12 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2033.codfw.wmnet
- 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P77840 and previous config saved to /var/cache/conftool/dbconfig/20250612-141205-fceratto.json
- 14:04 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1014.eqiad.wmnet with OS bullseye
- 14:02 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 13:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 13:57 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 13:57 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 13:57 vgutierrez: upload liberica 0.18 to apt.wm.o (bookworm-wikimedia) - T396751
- 13:57 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 13:57 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P77839 and previous config saved to /var/cache/conftool/dbconfig/20250612-135657-fceratto.json
- 13:56 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:55 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:46 moritzm: installing mariadb security updates (as shipped in Debian, not the wmf-mariadb packages we use for the main mariadb clusters)
- 13:46 gkyziridis@deploy1003: Finished scap sync-world: Backport for ores-extension: enable ores extension UI for second batch of wikis (T395823) (duration: 11m 00s)
- 13:45 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:45 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:45 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
- 13:44 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:44 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2034.codfw.wmnet
- 13:44 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:44 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:44 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:43 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77838 and previous config saved to /var/cache/conftool/dbconfig/20250612-134149-fceratto.json
- 13:41 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:41 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:41 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1014.eqiad.wmnet with OS bullseye
- 13:39 gkyziridis@deploy1003: gkyziridis, isaranto: Continuing with sync
- 13:37 moritzm: failover Ganeti master in eqiad to ganeti1046
- 13:37 gkyziridis@deploy1003: gkyziridis, isaranto: Backport for ores-extension: enable ores extension UI for second batch of wikis (T395823) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:36 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2034.codfw.wmnet
- 13:36 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1014.eqiad.wmnet with OS bullseye
- 13:36 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 13:35 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2034.codfw.wmnet
- 13:35 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 13:35 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:35 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable ores extension UI for second batch of wikis (T395823)
- 13:30 gkyziridis@deploy1003: Finished scap sync-world: Backport for Revert "ores-extension: enable oresUI for the second batch of wikis" (duration: 10m 01s)
- 13:28 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:27 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:26 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp7002*} and A:cp - 9.2.10 upgrade (T390912)
- 13:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77837 and previous config saved to /var/cache/conftool/dbconfig/20250612-132356-fceratto.json
- 13:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 13:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T395241)', diff saved to https://phabricator.wikimedia.org/P77836 and previous config saved to /var/cache/conftool/dbconfig/20250612-132329-fceratto.json
- 13:23 gkyziridis@deploy1003: gkyziridis: Continuing with sync
- 13:22 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 13:22 gkyziridis@deploy1003: gkyziridis: Backport for Revert "ores-extension: enable oresUI for the second batch of wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:22 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 13:21 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp7002*} and A:cp - 9.2.10 upgrade (T390912)
- 13:21 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp3081*} and A:cp - 9.2.10 upgrade (T390912)
- 13:20 gehel: depooling wdqs1022, it seems to not be updated - T396577
- 13:20 gehel: depooling wdqs1022, it seems to not be updated
- 13:20 gkyziridis@deploy1003: Started scap sync-world: Backport for Revert "ores-extension: enable oresUI for the second batch of wikis"
- 13:18 gkyziridis@deploy1003: Sync cancelled.
- 13:16 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp3081*} and A:cp - 9.2.10 upgrade (T390912)
- 13:10 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P77835 and previous config saved to /var/cache/conftool/dbconfig/20250612-130822-fceratto.json
- 13:06 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:06 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:06 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2034.codfw.wmnet
- 13:06 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 13:06 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable oresUI for the second batch of wikis (T395823 T395668) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:05 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:05 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 13:05 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1014.eqiad.wmnet with OS bullseye
- 13:04 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2030.codfw.wmnet
- 13:04 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable oresUI for the second batch of wikis (T395823 T395668)
- 13:03 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to 17.10
- 13:01 vgutierrez@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs7002.magru.wmnet with reason: switching to katran
- 13:01 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host ncredir7004.magru.wmnet
- 13:01 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ncredir7004.magru.wmnet with OS bookworm
- 13:00 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1014.eqiad.wmnet with OS bullseye
- 12:59 andrew@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 12:59 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7004.magru.wmnet with OS bookworm
- 12:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P77834 and previous config saved to /var/cache/conftool/dbconfig/20250612-125314-fceratto.json
- 12:53 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 12:52 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 12:52 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
- 12:52 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
- 12:52 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:51 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs7002.magru.wmnet} and A:liberica (T396561)
- 12:51 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
- 12:51 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P{lvs7002.magru.wmnet} and A:liberica (T396561)
- 12:51 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 12:51 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 12:50 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 12:50 andrew@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 12:50 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1014.eqiad.wmnet']
- 12:49 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 12:49 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
- 12:49 vgutierrez: depooling lvs7002 before migrating to katran - T396561
- 12:48 andrew@cumin1002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cloudcephosd1015.eqiad.wmnet
- 12:48 andrew@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1015.eqiad.wmnet
- 12:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T395241)', diff saved to https://phabricator.wikimedia.org/P77833 and previous config saved to /var/cache/conftool/dbconfig/20250612-123806-fceratto.json
- 12:37 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1014.eqiad.wmnet with OS bullseye
- 12:27 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
- 12:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
- 12:20 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
- 12:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T395241)', diff saved to https://phabricator.wikimedia.org/P77831 and previous config saved to /var/cache/conftool/dbconfig/20250612-121141-fceratto.json
- 12:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 12:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T395241)', diff saved to https://phabricator.wikimedia.org/P77830 and previous config saved to /var/cache/conftool/dbconfig/20250612-121125-fceratto.json
- 12:10 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
- 12:03 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2046.codfw.wmnet to cluster codfw and group A
- 12:01 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti2046.codfw.wmnet to cluster codfw and group A
- 12:00 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti2045.codfw.wmnet to cluster codfw and group A
- 11:58 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti2045.codfw.wmnet to cluster codfw and group A
- 11:56 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2046.codfw.wmnet
- 11:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P77829 and previous config saved to /var/cache/conftool/dbconfig/20250612-115618-fceratto.json
- 11:55 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7004.magru.wmnet
- 11:55 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ncredir7004.magru.wmnet with OS bookworm
- 11:49 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2046.codfw.wmnet
- 11:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P77828 and previous config saved to /var/cache/conftool/dbconfig/20250612-114110-fceratto.json
- 11:40 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1052.eqiad.wmnet
- 11:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1052.eqiad.wmnet
- 11:35 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1014.eqiad.wmnet with OS bullseye
- 11:35 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1014.eqiad.wmnet with OS bullseye
- 11:34 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1052.eqiad.wmnet
- 11:31 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2045.codfw.wmnet
- 11:30 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1052.eqiad.wmnet
- 11:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T395241)', diff saved to https://phabricator.wikimedia.org/P77826 and previous config saved to /var/cache/conftool/dbconfig/20250612-112602-fceratto.json
- 11:24 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2045.codfw.wmnet
- 11:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 11:17 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T396130)', diff saved to https://phabricator.wikimedia.org/P77825 and previous config saved to /var/cache/conftool/dbconfig/20250612-111722-marostegui.json
- 11:14 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T395241)', diff saved to https://phabricator.wikimedia.org/P77824 and previous config saved to /var/cache/conftool/dbconfig/20250612-111423-fceratto.json
- 11:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 11:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T395241)', diff saved to https://phabricator.wikimedia.org/P77823 and previous config saved to /var/cache/conftool/dbconfig/20250612-111357-fceratto.json
- 11:10 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1051.eqiad.wmnet
- 11:10 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1051.eqiad.wmnet
- 11:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:07 vgutierrez: use Google Trust Services (GTS) unified TLS certificate on drmrs - T395131
- 11:07 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1014.eqiad.wmnet with OS bullseye
- 11:05 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1051.eqiad.wmnet
- 11:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P77822 and previous config saved to /var/cache/conftool/dbconfig/20250612-110213-marostegui.json
- 11:01 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7004.magru.wmnet with OS bookworm
- 11:00 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 11:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:00 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 11:00 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
- 10:59 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
- 10:59 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:59 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 10:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P77821 and previous config saved to /var/cache/conftool/dbconfig/20250612-105848-fceratto.json
- 10:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1051.eqiad.wmnet
- 10:56 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1050.eqiad.wmnet
- 10:56 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1050.eqiad.wmnet
- 10:50 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1050.eqiad.wmnet
- 10:50 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 10:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:47 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1050.eqiad.wmnet
- 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P77820 and previous config saved to /var/cache/conftool/dbconfig/20250612-104706-marostegui.json
- 10:44 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1049.eqiad.wmnet
- 10:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1049.eqiad.wmnet
- 10:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P77819 and previous config saved to /var/cache/conftool/dbconfig/20250612-104341-fceratto.json
- 10:43 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 10:43 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
- 10:42 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2050.codfw.wmnet with OS bookworm
- 10:38 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1049.eqiad.wmnet
- 10:36 cgoubert@deploy1003: Finished scap sync-world: 1156288: mediawiki: Add job history limit control - T395885 (duration: 02m 48s)
- 10:33 cgoubert@deploy1003: Started scap sync-world: 1156288: mediawiki: Add job history limit control - T395885
- 10:32 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1049.eqiad.wmnet
- 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T396130)', diff saved to https://phabricator.wikimedia.org/P77818 and previous config saved to /var/cache/conftool/dbconfig/20250612-103159-marostegui.json
- 10:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T395241)', diff saved to https://phabricator.wikimedia.org/P77817 and previous config saved to /var/cache/conftool/dbconfig/20250612-102834-fceratto.json
- 10:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T396130)', diff saved to https://phabricator.wikimedia.org/P77816 and previous config saved to /var/cache/conftool/dbconfig/20250612-102700-marostegui.json
- 10:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 10:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T396130)', diff saved to https://phabricator.wikimedia.org/P77815 and previous config saved to /var/cache/conftool/dbconfig/20250612-102630-marostegui.json
- 10:25 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2050.codfw.wmnet with OS bookworm
- 10:24 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7004.magru.wmnet
- 10:23 jmm@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 10:23 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti2050.codfw.wmnet with OS bookworm
- 10:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T395241)', diff saved to https://phabricator.wikimedia.org/P77814 and previous config saved to /var/cache/conftool/dbconfig/20250612-101655-fceratto.json
- 10:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 10:14 moritzm: installing Kerberos security updates
- 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P77813 and previous config saved to /var/cache/conftool/dbconfig/20250612-101123-marostegui.json
- 10:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 10:07 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 10:07 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
- 10:06 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2050.codfw.wmnet with OS bookworm
- 09:53 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2049.codfw.wmnet with OS bookworm
- 09:50 esanders@deploy1003: Finished scap sync-world: Backport for Support placeholders mangled by MF's HtmlFormatter (T396695) (duration: 10m 37s)
- 09:46 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
- 09:43 esanders@deploy1003: esanders: Continuing with sync
- 09:42 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7004.magru.wmnet
- 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
- 09:42 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
- 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 09:41 esanders@deploy1003: esanders: Backport for Support placeholders mangled by MF's HtmlFormatter (T396695) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:41 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on ganeti1047.eqiad.wmnet with reason: hw check
- 09:41 cmooney@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
- 09:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T396130)', diff saved to https://phabricator.wikimedia.org/P77811 and previous config saved to /var/cache/conftool/dbconfig/20250612-094109-marostegui.json
- 09:39 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2002.codfw.wmnet
- 09:39 esanders@deploy1003: Started scap sync-world: Backport for Support placeholders mangled by MF's HtmlFormatter (T396695)
- 09:39 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 09:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2049.codfw.wmnet with reason: host reimage
- 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T396130)', diff saved to https://phabricator.wikimedia.org/P77809 and previous config saved to /var/cache/conftool/dbconfig/20250612-093631-marostegui.json
- 09:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77808 and previous config saved to /var/cache/conftool/dbconfig/20250612-093609-marostegui.json
- 09:34 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2049.codfw.wmnet with reason: host reimage
- 09:32 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2002.codfw.wmnet
- 09:29 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 09:28 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
- 09:28 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
- 09:28 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:26 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 09:26 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
- 09:24 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1002.eqiad.wmnet
- 09:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P77806 and previous config saved to /var/cache/conftool/dbconfig/20250612-092103-marostegui.json
- 09:20 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:19 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ncredir7004.magru.wmnet
- 09:19 jmm@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 09:15 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1002.eqiad.wmnet
- 09:11 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana1002.eqiad.wmnet
- 09:08 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to 17.10
- 09:07 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana1002.eqiad.wmnet
- 09:07 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.10
- 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P77805 and previous config saved to /var/cache/conftool/dbconfig/20250612-090555-marostegui.json
- 09:05 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2049.codfw.wmnet with OS bookworm
- 09:04 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
- 09:04 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 09:04 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
- 09:04 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on install7001.wikimedia.org with reason: migration to install7002
- 08:57 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
- 08:57 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.10
- 08:56 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host ncredir7004.magru.wmnet
- 08:56 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ncredir7004.magru.wmnet with OS bookworm
- 08:56 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.10
- 08:54 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2048.codfw.wmnet with OS bookworm
- 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77804 and previous config saved to /var/cache/conftool/dbconfig/20250612-085359-root.json
- 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77803 and previous config saved to /var/cache/conftool/dbconfig/20250612-085048-marostegui.json
- 08:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:46 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.10
- 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77802 and previous config saved to /var/cache/conftool/dbconfig/20250612-084611-marostegui.json
- 08:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77801 and previous config saved to /var/cache/conftool/dbconfig/20250612-084600-marostegui.json
- 08:38 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2048.codfw.wmnet with reason: host reimage
- 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77800 and previous config saved to /var/cache/conftool/dbconfig/20250612-083854-root.json
- 08:35 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2048.codfw.wmnet with reason: host reimage
- 08:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P77799 and previous config saved to /var/cache/conftool/dbconfig/20250612-083053-marostegui.json
- 08:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:26 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77798 and previous config saved to /var/cache/conftool/dbconfig/20250612-082348-root.json
- 08:23 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7004.magru.wmnet with OS bookworm
- 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2225 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77797 and previous config saved to /var/cache/conftool/dbconfig/20250612-082223-root.json
- 08:22 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
- 08:19 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 08:19 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 08:19 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7004.magru.wmnet on all recursors
- 08:19 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7004.magru.wmnet on all recursors
- 08:19 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:19 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 08:19 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7004.magru.wmnet - jmm@cumin1003"
- 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P77796 and previous config saved to /var/cache/conftool/dbconfig/20250612-081546-marostegui.json
- 08:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:12 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:12 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:12 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:11 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:11 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77795 and previous config saved to /var/cache/conftool/dbconfig/20250612-080843-root.json
- 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2225 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77794 and previous config saved to /var/cache/conftool/dbconfig/20250612-080717-root.json
- 08:01 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 08:01 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7004.magru.wmnet
- 08:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77793 and previous config saved to /var/cache/conftool/dbconfig/20250612-080039-marostegui.json
- 07:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir7001.magru.wmnet
- 07:59 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:59 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 07:58 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 07:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2047.codfw.wmnet with OS bookworm
- 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77791 and previous config saved to /var/cache/conftool/dbconfig/20250612-075501-marostegui.json
- 07:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77790 and previous config saved to /var/cache/conftool/dbconfig/20250612-075437-marostegui.json
- 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'es1046 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77789 and previous config saved to /var/cache/conftool/dbconfig/20250612-075338-root.json
- 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2225 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77788 and previous config saved to /var/cache/conftool/dbconfig/20250612-075211-root.json
- 07:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:49 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 07:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1046.eqiad.wmnet with reason: Maintenance
- 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1046', diff saved to https://phabricator.wikimedia.org/P77787 and previous config saved to /var/cache/conftool/dbconfig/20250612-074624-marostegui.json
- 07:45 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts ncredir7001.magru.wmnet
- 07:44 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
- 07:44 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5002.eqsin.wmnet
- 07:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2047.codfw.wmnet with reason: host reimage
- 07:40 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2047.codfw.wmnet with reason: host reimage
- 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P77786 and previous config saved to /var/cache/conftool/dbconfig/20250612-073930-marostegui.json
- 07:38 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
- 07:38 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
- 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2225 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77785 and previous config saved to /var/cache/conftool/dbconfig/20250612-073705-root.json
- 07:34 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
- 07:30 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
- 07:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2225.codfw.wmnet with reason: Maintenance
- 07:28 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
- 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2225 T396549', diff saved to https://phabricator.wikimedia.org/P77784 and previous config saved to /var/cache/conftool/dbconfig/20250612-072827-marostegui.json
- 07:28 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
- 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P77783 and previous config saved to /var/cache/conftool/dbconfig/20250612-072422-marostegui.json
- 07:23 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
- 07:23 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1008.eqiad.wmnet
- 07:15 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
- 07:15 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1008.eqiad.wmnet
- 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77782 and previous config saved to /var/cache/conftool/dbconfig/20250612-070914-marostegui.json
- 07:08 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
- 07:07 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
- 07:04 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
- 07:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T396130)', diff saved to https://phabricator.wikimedia.org/P77781 and previous config saved to /var/cache/conftool/dbconfig/20250612-070427-marostegui.json
- 07:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 07:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77780 and previous config saved to /var/cache/conftool/dbconfig/20250612-070405-marostegui.json
- 06:58 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
- 06:57 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
- 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77779 and previous config saved to /var/cache/conftool/dbconfig/20250612-065028-root.json
- 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P77778 and previous config saved to /var/cache/conftool/dbconfig/20250612-064858-marostegui.json
- 06:42 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet,dbprov1004.eqiad.wmnet with reason: Downtime hosts for MariaDB 10.11 upgrade
- 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77777 and previous config saved to /var/cache/conftool/dbconfig/20250612-063755-root.json
- 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77776 and previous config saved to /var/cache/conftool/dbconfig/20250612-063522-root.json
- 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P77775 and previous config saved to /var/cache/conftool/dbconfig/20250612-063350-marostegui.json
- 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2226 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77774 and previous config saved to /var/cache/conftool/dbconfig/20250612-062546-root.json
- 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77773 and previous config saved to /var/cache/conftool/dbconfig/20250612-062248-root.json
- 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77772 and previous config saved to /var/cache/conftool/dbconfig/20250612-062016-root.json
- 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77771 and previous config saved to /var/cache/conftool/dbconfig/20250612-061843-marostegui.json
- 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1207 from dbctl T396697', diff saved to https://phabricator.wikimedia.org/P77770 and previous config saved to /var/cache/conftool/dbconfig/20250612-061700-marostegui.json
- 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77769 and previous config saved to /var/cache/conftool/dbconfig/20250612-061405-marostegui.json
- 06:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2226 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77767 and previous config saved to /var/cache/conftool/dbconfig/20250612-061041-root.json
- 06:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Maintenance
- 06:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 06:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77766 and previous config saved to /var/cache/conftool/dbconfig/20250612-060743-root.json
- 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77765 and previous config saved to /var/cache/conftool/dbconfig/20250612-060510-root.json
- 05:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 05:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2226 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77764 and previous config saved to /var/cache/conftool/dbconfig/20250612-055535-root.json
- 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1207 T396697', diff saved to https://phabricator.wikimedia.org/P77763 and previous config saved to /var/cache/conftool/dbconfig/20250612-055439-marostegui.json
- 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db1206', diff saved to https://phabricator.wikimedia.org/P77762 and previous config saved to /var/cache/conftool/dbconfig/20250612-055339-marostegui.json
- 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1206 T396697', diff saved to https://phabricator.wikimedia.org/P77761 and previous config saved to /var/cache/conftool/dbconfig/20250612-055318-marostegui.json
- 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77760 and previous config saved to /var/cache/conftool/dbconfig/20250612-055237-root.json
- 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1184 T396697', diff saved to https://phabricator.wikimedia.org/P77759 and previous config saved to /var/cache/conftool/dbconfig/20250612-055136-marostegui.json
- 05:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77758 and previous config saved to /var/cache/conftool/dbconfig/20250612-055005-root.json
- 05:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2026.codfw.wmnet with reason: Maintenance
- 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2026', diff saved to https://phabricator.wikimedia.org/P77757 and previous config saved to /var/cache/conftool/dbconfig/20250612-054315-marostegui.json
- 05:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2226 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77756 and previous config saved to /var/cache/conftool/dbconfig/20250612-054030-root.json
- 05:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2226.codfw.wmnet with reason: Maintenance
- 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2226', diff saved to https://phabricator.wikimedia.org/P77755 and previous config saved to /var/cache/conftool/dbconfig/20250612-053450-marostegui.json
- 05:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2239.codfw.wmnet with reason: Maintenance
- 04:27 TimStarling: ran cleanupBlocks.php on all wikis for T373847 and T389301
- 03:52 eileen: config revision changed from 724b1679 to df8bc7dd
2025-06-11
- 22:53 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1253.eqiad.wmnet with reason: Firmware upgrade (T396648)
- 22:49 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
- 22:48 ladsgroup@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
- 22:48 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts db1253.eqiad.wmnet
- 22:47 ladsgroup@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts db1253.eqiad.wmnet
- 22:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1253.eqiad.wmnet with reason: Firmware upgrade (T396648)
- 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1253 (T396648)', diff saved to https://phabricator.wikimedia.org/P77754 and previous config saved to /var/cache/conftool/dbconfig/20250611-224035-ladsgroup.json
- 21:48 cdobbins@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
- 21:43 cdobbins@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
- 21:28 apine@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 21:27 apine@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 21:27 apine@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 21:26 apine@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 21:24 cdobbins@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
- 21:21 jforrester@deploy1003: Finished scap sync-world: Backport for WikiLambda: Set repo-only config only in repo mode, WikiLambda: Enable orchestrator cache updates on edit (T390746) (duration: 09m 45s)
- 21:18 cdobbins@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp - Fix VSLbs() assert error and upgrade libvmod-wmfuniq to 0.2.0 (T396581)
- 21:14 jforrester@deploy1003: jforrester: Continuing with sync
- 21:14 jforrester@deploy1003: jforrester: Backport for WikiLambda: Set repo-only config only in repo mode, WikiLambda: Enable orchestrator cache updates on edit (T390746) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:12 jforrester@deploy1003: Started scap sync-world: Backport for WikiLambda: Set repo-only config only in repo mode, WikiLambda: Enable orchestrator cache updates on edit (T390746)
- 20:31 dwisehaupt@dns1004: END - running authdns-update
- 20:30 dwisehaupt@dns1004: START - running authdns-update
- 20:27 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 20:24 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
- 20:16 cjming@deploy1003: Finished scap sync-world: Backport for Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation (T396618) (duration: 10m 00s)
- 20:09 cjming@deploy1003: matmarex, cjming: Continuing with sync
- 20:08 cjming@deploy1003: matmarex, cjming: Backport for Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation (T396618) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 20:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
- 20:07 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:06 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:06 cjming@deploy1003: Started scap sync-world: Backport for Change OutputPage::wrapWikiTextAsInterface() to soft-deprecation (T396618)
- 20:06 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
- 20:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 19:46 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 19:45 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
- 19:44 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:38 aokoth@dns1004: END - running authdns-update
- 19:38 aokoth@dns1004: START - running authdns-update
- 19:34 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:32 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:31 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:31 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:30 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
- 19:17 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:16 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 19:13 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
- 19:10 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1186.eqiad.wmnet with OS bullseye
- 19:02 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 18:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T395241)', diff saved to https://phabricator.wikimedia.org/P77751 and previous config saved to /var/cache/conftool/dbconfig/20250611-185748-fceratto.json
- 18:56 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1186.eqiad.wmnet with OS bullseye
- 18:52 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:51 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 18:51 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 18:51 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 18:50 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 18:50 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 18:50 moritzm: remove ganeti1047 from Ganeti cluster in eqiad for hardware diagnosis
- 18:50 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 18:50 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 18:49 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 18:49 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:49 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:44 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:43 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1047.eqiad.wmnet
- 18:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P77750 and previous config saved to /var/cache/conftool/dbconfig/20250611-184242-fceratto.json
- 18:42 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 18:37 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:37 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
- 18:37 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
- 18:37 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 18:36 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 18:36 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 18:36 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 18:36 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 18:35 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 18:35 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 18:34 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 18:31 urandom: truncating restbase mobile-sections table â T395845
- 18:30 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 18:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P77749 and previous config saved to /var/cache/conftool/dbconfig/20250611-182735-fceratto.json
- 18:26 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:26 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
- 18:26 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
- 18:23 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 18:21 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.5 refs T392175
- 18:16 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 18:16 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
- 18:13 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1002"
- 18:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T395241)', diff saved to https://phabricator.wikimedia.org/P77748 and previous config saved to /var/cache/conftool/dbconfig/20250611-181228-fceratto.json
- 18:12 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 18:12 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 18:10 sukhe: sudo cumin 'A:lvs-low-traffic-eqiad or A:lvs-low-traffic-codfw' 'run-puppet-agent': T143553
- 18:09 brennen: 1.45.0-wmf.5 train status (392175): no current blockers, logs reasonably clean, rolling to group1
- 18:08 sukhe: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-secondary-codfw' 'run-puppet-agent': T143553
- 18:06 sukhe@dns1004: END - running authdns-update
- 18:06 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 18:05 sukhe@dns1004: START - running authdns-update
- 18:04 sukhe@dns1004: END - running authdns-update
- 18:03 sukhe@dns1004: START - running authdns-update
- 18:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T395241)', diff saved to https://phabricator.wikimedia.org/P77747 and previous config saved to /var/cache/conftool/dbconfig/20250611-180309-fceratto.json
- 18:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1251.eqiad.wmnet with reason: Maintenance
- 18:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T395241)', diff saved to https://phabricator.wikimedia.org/P77746 and previous config saved to /var/cache/conftool/dbconfig/20250611-180254-fceratto.json
- 17:57 ryankemper: T143553 Pooled `dns-disc=search-(omega|psi)` per plan in https://phabricator.wikimedia.org/T143553#10861215
- 17:56 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search-omega
- 17:56 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search-psi
- 17:55 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
- 17:51 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
- 17:50 sukhe: running agent on A:dnsbox T143553
- 17:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
- 17:48 ryankemper: T143553 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151300 to add dnsdisc entries for omega/psi clusters (second patch in plan https://phabricator.wikimedia.org/T143553#10861215)
- 17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P77745 and previous config saved to /var/cache/conftool/dbconfig/20250611-174747-fceratto.json
- 17:45 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
- 17:37 ryankemper: T143553 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1151308 (first patch in plan https://phabricator.wikimedia.org/T143553#10861215)
- 17:35 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 17:35 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 17:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P77744 and previous config saved to /var/cache/conftool/dbconfig/20250611-173240-fceratto.json
- 17:29 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 17:19 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 17:18 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 17:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T395241)', diff saved to https://phabricator.wikimedia.org/P77743 and previous config saved to /var/cache/conftool/dbconfig/20250611-171733-fceratto.json
- 17:16 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 17:13 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 17:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dse-k8s-worker1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T395241)', diff saved to https://phabricator.wikimedia.org/P77742 and previous config saved to /var/cache/conftool/dbconfig/20250611-170922-fceratto.json
- 17:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 17:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T395241)', diff saved to https://phabricator.wikimedia.org/P77741 and previous config saved to /var/cache/conftool/dbconfig/20250611-170857-fceratto.json
- 17:01 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 17:00 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 16:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P77740 and previous config saved to /var/cache/conftool/dbconfig/20250611-165350-fceratto.json
- 16:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P77739 and previous config saved to /var/cache/conftool/dbconfig/20250611-163842-fceratto.json
- 16:34 btullis@cumin1002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:33 btullis@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1013
- 16:32 btullis@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1013
- 16:31 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:31 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1013 vlan - btullis@cumin1002"
- 16:31 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1013 vlan - btullis@cumin1002"
- 16:27 btullis@cumin1002: START - Cookbook sre.dns.netbox
- 16:25 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dse-k8s-worker1013.eqiad.wmnet
- 16:25 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:25 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
- 16:25 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
- 16:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T395241)', diff saved to https://phabricator.wikimedia.org/P77738 and previous config saved to /var/cache/conftool/dbconfig/20250611-162335-fceratto.json
- 16:18 btullis@cumin1002: START - Cookbook sre.dns.netbox
- 16:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T395241)', diff saved to https://phabricator.wikimedia.org/P77737 and previous config saved to /var/cache/conftool/dbconfig/20250611-161509-fceratto.json
- 16:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 16:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T395241)', diff saved to https://phabricator.wikimedia.org/P77736 and previous config saved to /var/cache/conftool/dbconfig/20250611-161444-fceratto.json
- 16:10 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@b0517a4]: Deploy to pickup T385112#10905490. (duration: 02m 14s)
- 16:10 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@b0517a4]: Deploy to pickup T385112#10905490.
- 16:09 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts dse-k8s-worker1013.eqiad.wmnet
- 16:02 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 16:01 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:00 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P77735 and previous config saved to /var/cache/conftool/dbconfig/20250611-155937-fceratto.json
- 15:59 dancy@deploy1003: Installation of scap version "4.173.0" completed for 2 hosts
- 15:57 dancy@deploy1003: Installing scap version "4.173.0" for 2 host(s)
- 15:56 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
- 15:56 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
- 15:56 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
- 15:55 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-media: apply
- 15:55 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 15:55 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 15:55 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox: apply
- 15:55 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox: apply
- 15:52 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:47 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 15:46 btullis@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1012
- 15:44 btullis@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1012
- 15:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P77734 and previous config saved to /var/cache/conftool/dbconfig/20250611-154430-fceratto.json
- 15:43 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:42 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:42 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1012 vlan - btullis@cumin1002"
- 15:42 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: After moving dse-k8s-worker1012 vlan - btullis@cumin1002"
- 15:39 btullis@cumin1002: START - Cookbook sre.dns.netbox
- 15:38 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:38 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 264525
- 15:38 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 264525
- 15:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T395241)', diff saved to https://phabricator.wikimedia.org/P77733 and previous config saved to /var/cache/conftool/dbconfig/20250611-152923-fceratto.json
- 15:29 vgutierrez: use Google Trust Services (GTS) unified TLS certificate on eqsin - T395131
- 15:26 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 15:26 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 15:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts relforge[1003-1004].eqiad.wmnet
- 15:24 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:24 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: relforge[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 15:24 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: relforge[1003-1004].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 15:24 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dse-k8s-worker1012.eqiad.wmnet
- 15:24 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:24 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1012.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
- 15:23 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1012.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
- 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T395241)', diff saved to https://phabricator.wikimedia.org/P77732 and previous config saved to /var/cache/conftool/dbconfig/20250611-152220-fceratto.json
- 15:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 15:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77731 and previous config saved to /var/cache/conftool/dbconfig/20250611-152155-fceratto.json
- 15:21 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet [reason: repooling after testing 9.2.10 upgrade: T390912]
- 15:21 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:20 btullis@cumin1002: START - Cookbook sre.dns.netbox
- 15:19 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade of ATS on P{cp4037*} and A:cp - 9.2.10 upgrade (T390912)
- 15:15 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan1001.eqiad.wmnet
- 15:15 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade of ATS on P{cp4037*} and A:cp - 9.2.10 upgrade (T390912)
- 15:15 bking@cumin2002: START - Cookbook sre.dns.netbox
- 15:14 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts dse-k8s-worker1012.eqiad.wmnet
- 15:14 sukhe: depool cp4037 to test ATS 9.2.10 upgrade: T390912
- 15:13 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet [reason: testing 9.2.10 upgrade]
- 15:10 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.10-1wm2_amd64.changes: T390912
- 15:09 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:09 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:09 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:08 elukey@cumin1003: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:08 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:06 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts relforge[1003-1004].eqiad.wmnet
- 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P77729 and previous config saved to /var/cache/conftool/dbconfig/20250611-150647-fceratto.json
- 15:06 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan1001.eqiad.wmnet
- 15:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:03 elukey@cumin1003: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 15:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:02 bking@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:01 bking@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
- 14:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P77727 and previous config saved to /var/cache/conftool/dbconfig/20250611-145140-fceratto.json
- 14:46 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host titan2001.codfw.wmnet
- 14:45 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts relforge[1003-1004].eqiad.wmnet
- 14:44 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts relforge[1003-1004].eqiad.wmnet
- 14:40 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1047.eqiad.wmnet
- 14:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
- 14:39 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host titan2001.codfw.wmnet
- 14:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77726 and previous config saved to /var/cache/conftool/dbconfig/20250611-143633-fceratto.json
- 14:34 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
- 14:34 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
- 14:31 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:31 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77724 and previous config saved to /var/cache/conftool/dbconfig/20250611-142816-fceratto.json
- 14:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 14:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T395241)', diff saved to https://phabricator.wikimedia.org/P77723 and previous config saved to /var/cache/conftool/dbconfig/20250611-142750-fceratto.json
- 14:26 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:23 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:21 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:19 apine@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:18 apine@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:18 apine@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:18 apine@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:17 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 14:16 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:16 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:15 elukey@cumin1003: START - Cookbook sre.hosts.provision for host an-conf1006.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 14:15 andrew-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 14:15 andrew-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 14:13 apine@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:13 apine@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:12 apine@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P77722 and previous config saved to /var/cache/conftool/dbconfig/20250611-141243-fceratto.json
- 14:12 andrew-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 14:11 apine@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:11 Lucas_WMDE: UTC afternoon backport+config window done
- 14:10 andrew-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 14:10 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:10 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:08 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Set $wgPHPSessionHandling to 'disable' on testwiki and beta cluster (T362324), Stop logging $wgPHPSessionHandling warnings for now (T393963) (duration: 11m 14s)
- 14:06 andrew-wmde@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 14:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
- 14:04 andrew-wmde@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 14:01 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with sync
- 14:00 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
- 14:00 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 10310
- 13:59 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for Set $wgPHPSessionHandling to 'disable' on testwiki and beta cluster (T362324), Stop logging $wgPHPSessionHandling warnings for now (T393963) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P77721 and previous config saved to /var/cache/conftool/dbconfig/20250611-135736-fceratto.json
- 13:57 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
- 13:57 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Set $wgPHPSessionHandling to 'disable' on testwiki and beta cluster (T362324), Stop logging $wgPHPSessionHandling warnings for now (T393963)
- 13:53 esanders@deploy1003: Finished scap sync-world: Backport for Enable DiscussionTools visual enhancements everywhere except 12 wikis (T392121) (duration: 12m 36s)
- 13:50 vgutierrez: upload varnish 7.1.1-2~bpo11+wmf2 to apt.wm.o (bullseye-wikimedia) - T396581
- 13:48 kart_: Updated Recommnedation-API to 2025-06-10-203235-production (T374695)
- 13:47 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:46 esanders@deploy1003: esanders: Continuing with sync
- 13:45 hnowlan: migrating reading lists out of restbase for all wikis
- 13:43 esanders@deploy1003: esanders: Backport for Enable DiscussionTools visual enhancements everywhere except 12 wikis (T392121) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:43 kartik@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T395241)', diff saved to https://phabricator.wikimedia.org/P77720 and previous config saved to /var/cache/conftool/dbconfig/20250611-134230-fceratto.json
- 13:41 esanders@deploy1003: Started scap sync-world: Backport for Enable DiscussionTools visual enhancements everywhere except 12 wikis (T392121)
- 13:39 kartik@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:38 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Update searchsuggest message key (T396219) (duration: 09m 57s)
- 13:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T395241)', diff saved to https://phabricator.wikimedia.org/P77719 and previous config saved to /var/cache/conftool/dbconfig/20250611-133420-fceratto.json
- 13:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 13:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T395241)', diff saved to https://phabricator.wikimedia.org/P77718 and previous config saved to /var/cache/conftool/dbconfig/20250611-133355-fceratto.json
- 13:31 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
- 13:31 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Update searchsuggest message key (T396219) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:28 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Update searchsuggest message key (T396219)
- 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P77717 and previous config saved to /var/cache/conftool/dbconfig/20250611-131848-fceratto.json
- 13:14 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for SUL3: Enable client hints data on the auth shared domain (T395185) (duration: 11m 09s)
- 13:11 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
- 13:07 lucaswerkmeister-wmde@deploy1003: d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
- 13:05 lucaswerkmeister-wmde@deploy1003: d3r1ck01, lucaswerkmeister-wmde: Backport for SUL3: Enable client hints data on the auth shared domain (T395185) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P77716 and previous config saved to /var/cache/conftool/dbconfig/20250611-130341-fceratto.json
- 13:03 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
- 13:03 akosiaris: T393557 block requests to /api/rest_v1/page/data-parsoid
- 13:03 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for SUL3: Enable client hints data on the auth shared domain (T395185)
- 13:00 XioNoX: disable lvs6002 secondary link switch port - T367731
- 12:58 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
- 12:54 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
- 12:54 XioNoX: disable lvs3008 secondary link switch port - T367731
- 12:54 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
- 12:51 XioNoX: disable lvs3009 secondary link switch port - T367731
- 12:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T395241)', diff saved to https://phabricator.wikimedia.org/P77715 and previous config saved to /var/cache/conftool/dbconfig/20250611-124834-fceratto.json
- 12:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:47 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
- 12:47 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
- 12:43 XioNoX: disable lvs7001 secondary link switch port - T367731
- 12:41 XioNoX: disable lvs7002 secondary link switch port - T367731
- 12:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T396130)', diff saved to https://phabricator.wikimedia.org/P77714 and previous config saved to /var/cache/conftool/dbconfig/20250611-123753-marostegui.json
- 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T395241)', diff saved to https://phabricator.wikimedia.org/P77713 and previous config saved to /var/cache/conftool/dbconfig/20250611-123727-fceratto.json
- 12:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77712 and previous config saved to /var/cache/conftool/dbconfig/20250611-123702-fceratto.json
- 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P77711 and previous config saved to /var/cache/conftool/dbconfig/20250611-122246-marostegui.json
- 12:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P77710 and previous config saved to /var/cache/conftool/dbconfig/20250611-122155-fceratto.json
- 12:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:21 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P77709 and previous config saved to /var/cache/conftool/dbconfig/20250611-120740-marostegui.json
- 12:07 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
- 12:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P77708 and previous config saved to /var/cache/conftool/dbconfig/20250611-120648-fceratto.json
- 12:01 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1046.eqiad.wmnet
- 12:01 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
- 11:56 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
- 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T396130)', diff saved to https://phabricator.wikimedia.org/P77707 and previous config saved to /var/cache/conftool/dbconfig/20250611-115231-marostegui.json
- 11:51 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1046.eqiad.wmnet
- 11:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77706 and previous config saved to /var/cache/conftool/dbconfig/20250611-115140-fceratto.json
- 11:46 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
- 11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77704 and previous config saved to /var/cache/conftool/dbconfig/20250611-114447-fceratto.json
- 11:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T395241)', diff saved to https://phabricator.wikimedia.org/P77703 and previous config saved to /var/cache/conftool/dbconfig/20250611-114422-fceratto.json
- 11:43 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1045.eqiad.wmnet
- 11:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
- 11:42 XioNoX: disable lvs3010 secondary link switch port - T367731
- 11:41 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
- 11:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:39 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
- 11:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:38 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
- 11:35 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1045.eqiad.wmnet
- 11:35 jmm@dns1004: END - running authdns-update
- 11:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1044.eqiad.wmnet
- 11:34 jmm@dns1004: START - running authdns-update
- 11:34 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
- 11:34 klausman@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ml-serve1001.eqiad.wmnet
- 11:34 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
- 11:34 XioNoX: disable lvs7003 secondary link switch port - T367731
- 11:33 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
- 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T396130)', diff saved to https://phabricator.wikimedia.org/P77702 and previous config saved to /var/cache/conftool/dbconfig/20250611-113336-marostegui.json
- 11:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2227.codfw.wmnet with reason: Maintenance
- 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T396130)', diff saved to https://phabricator.wikimedia.org/P77701 and previous config saved to /var/cache/conftool/dbconfig/20250611-113312-marostegui.json
- 11:32 jmm@puppetserver1001: conftool action : set/pooled=yes; selector: name=ncredir7003.magru.wmnet
- 11:32 jmm@puppetserver1001: conftool action : set/weight=1; selector: name=ncredir7003.magru.wmnet
- 11:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P77700 and previous config saved to /var/cache/conftool/dbconfig/20250611-112914-fceratto.json
- 11:28 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
- 11:28 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
- 11:28 Ammar: Ran fixStuckGlobalRename.php for T396545
- 11:24 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1044.eqiad.wmnet
- 11:22 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
- 11:22 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
- 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P77699 and previous config saved to /var/cache/conftool/dbconfig/20250611-111805-marostegui.json
- 11:17 moritzm: installing librabbitmq security updates
- 11:16 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
- 11:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P77698 and previous config saved to /var/cache/conftool/dbconfig/20250611-111407-fceratto.json
- 11:13 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
- 11:06 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 11:06 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
- 11:06 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
- 11:06 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 11:05 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 11:05 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
- 11:03 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P77697 and previous config saved to /var/cache/conftool/dbconfig/20250611-110257-marostegui.json
- 11:02 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:02 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:02 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:01 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
- 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T395241)', diff saved to https://phabricator.wikimedia.org/P77696 and previous config saved to /var/cache/conftool/dbconfig/20250611-105900-fceratto.json
- 10:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
- 10:55 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
- 10:55 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
- 10:50 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
- 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T395241)', diff saved to https://phabricator.wikimedia.org/P77695 and previous config saved to /var/cache/conftool/dbconfig/20250611-104825-fceratto.json
- 10:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 10:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T396130)', diff saved to https://phabricator.wikimedia.org/P77694 and previous config saved to /var/cache/conftool/dbconfig/20250611-104750-marostegui.json
- 10:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T395241)', diff saved to https://phabricator.wikimedia.org/P77693 and previous config saved to /var/cache/conftool/dbconfig/20250611-104741-fceratto.json
- 10:46 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
- 10:45 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
- 10:45 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
- 10:40 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
- 10:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P77692 and previous config saved to /var/cache/conftool/dbconfig/20250611-103234-fceratto.json
- 10:32 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
- 10:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:29 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
- 10:29 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
- 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T396130)', diff saved to https://phabricator.wikimedia.org/P77691 and previous config saved to /var/cache/conftool/dbconfig/20250611-102902-marostegui.json
- 10:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2205.codfw.wmnet with reason: Maintenance
- 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T396130)', diff saved to https://phabricator.wikimedia.org/P77690 and previous config saved to /var/cache/conftool/dbconfig/20250611-102839-marostegui.json
- 10:23 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
- 10:20 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
- 10:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P77689 and previous config saved to /var/cache/conftool/dbconfig/20250611-101727-fceratto.json
- 10:15 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
- 10:15 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
- 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P77688 and previous config saved to /var/cache/conftool/dbconfig/20250611-101332-marostegui.json
- 10:09 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:07 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
- 10:06 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
- 10:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T395241)', diff saved to https://phabricator.wikimedia.org/P77687 and previous config saved to /var/cache/conftool/dbconfig/20250611-100220-fceratto.json
- 10:00 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P77686 and previous config saved to /var/cache/conftool/dbconfig/20250611-095825-marostegui.json
- 09:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:56 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
- 09:55 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
- 09:53 vgutierrez: restarting varnish on cp5018 to clear VarnishChildRestarted alert
- 09:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T395241)', diff saved to https://phabricator.wikimedia.org/P77684 and previous config saved to /var/cache/conftool/dbconfig/20250611-095139-fceratto.json
- 09:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 09:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T395241)', diff saved to https://phabricator.wikimedia.org/P77683 and previous config saved to /var/cache/conftool/dbconfig/20250611-095113-fceratto.json
- 09:48 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
- 09:44 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
- 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T396130)', diff saved to https://phabricator.wikimedia.org/P77682 and previous config saved to /var/cache/conftool/dbconfig/20250611-094319-marostegui.json
- 09:40 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1013.eqiad.wmnet with reason: host reimage
- 09:38 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
- 09:37 vgutierrez: use Google Trust Services (GTS) unified TLS certificate on magru - T395131
- 09:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P77681 and previous config saved to /var/cache/conftool/dbconfig/20250611-093606-fceratto.json
- 09:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T396130)', diff saved to https://phabricator.wikimedia.org/P77680 and previous config saved to /var/cache/conftool/dbconfig/20250611-092518-marostegui.json
- 09:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 09:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T396130)', diff saved to https://phabricator.wikimedia.org/P77679 and previous config saved to /var/cache/conftool/dbconfig/20250611-092457-marostegui.json
- 09:24 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1013.eqiad.wmnet with OS bookworm
- 09:23 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-db1002 to dse-k8s-worker1013
- 09:22 brouberol@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1013
- 09:21 brouberol@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1013
- 09:21 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1013 on all recursors
- 09:21 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1013 on all recursors
- 09:21 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:21 brouberol@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1002 to dse-k8s-worker1013 - brouberol@cumin2002"
- 09:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P77678 and previous config saved to /var/cache/conftool/dbconfig/20250611-092059-fceratto.json
- 09:20 brouberol@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1002 to dse-k8s-worker1013 - brouberol@cumin2002"
- 09:19 elukey: repool eqiad for inference.discovery.wmnet - was left depooled after a long maintenance for k8s infra changes a week ago
- 09:18 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=inference,name=eqiad
- 09:17 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
- 09:17 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
- 09:15 brouberol@cumin2002: START - Cookbook sre.dns.netbox
- 09:14 brouberol@cumin2002: START - Cookbook sre.hosts.rename from an-db1002 to dse-k8s-worker1013
- 09:12 moritzm: installing libfile-find-rule-perl security updates
- 09:11 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 09:11 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
- 09:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P77674 and previous config saved to /var/cache/conftool/dbconfig/20250611-090949-marostegui.json
- 09:09 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
- 09:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T395241)', diff saved to https://phabricator.wikimedia.org/P77673 and previous config saved to /var/cache/conftool/dbconfig/20250611-090552-fceratto.json
- 09:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:00 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-staging-worker
- 08:59 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
- 08:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T395241)', diff saved to https://phabricator.wikimedia.org/P77672 and previous config saved to /var/cache/conftool/dbconfig/20250611-085615-fceratto.json
- 08:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 08:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T395241)', diff saved to https://phabricator.wikimedia.org/P77671 and previous config saved to /var/cache/conftool/dbconfig/20250611-085552-fceratto.json
- 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P77670 and previous config saved to /var/cache/conftool/dbconfig/20250611-085442-marostegui.json
- 08:54 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
- 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow1003.eqiad.wmnet
- 08:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow1003.eqiad.wmnet with OS bookworm
- 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
- 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
- 08:51 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1012.eqiad.wmnet with reason: host reimage
- 08:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
- 08:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P77669 and previous config saved to /var/cache/conftool/dbconfig/20250611-084045-fceratto.json
- 08:39 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
- 08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T396130)', diff saved to https://phabricator.wikimedia.org/P77668 and previous config saved to /var/cache/conftool/dbconfig/20250611-083935-marostegui.json
- 08:37 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
- 08:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
- 08:35 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host dse-k8s-worker1012
- 08:35 brouberol@cumin2002: START - Cookbook sre.hosts.move-vlan for host dse-k8s-worker1012
- 08:35 brouberol@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1012.eqiad.wmnet with OS bookworm
- 08:33 tappof: T395240 May 2025 Bookworm reboots: alert2002.wikimedia.org
- 08:32 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-db1001 to dse-k8s-worker1012
- 08:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow1003.eqiad.wmnet with reason: host reimage
- 08:32 brouberol@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1012
- 08:32 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host alert2002.wikimedia.org
- 08:32 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host alert2002.wikimedia.org
- 08:30 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
- 08:30 brouberol@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1012
- 08:30 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1012 on all recursors
- 08:30 brouberol@cumin2002: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1012 on all recursors
- 08:30 brouberol@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:30 brouberol@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1001 to dse-k8s-worker1012 - brouberol@cumin2002"
- 08:30 brouberol@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-db1001 to dse-k8s-worker1012 - brouberol@cumin2002"
- 08:27 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow1003.eqiad.wmnet with reason: host reimage
- 08:27 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
- 08:26 brouberol@cumin2002: START - Cookbook sre.dns.netbox
- 08:26 brouberol@cumin2002: START - Cookbook sre.hosts.rename from an-db1001 to dse-k8s-worker1012
- 08:26 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
- 08:25 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
- 08:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P77667 and previous config saved to /var/cache/conftool/dbconfig/20250611-082538-fceratto.json
- 08:22 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1008.eqiad.wmnet
- 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T396130)', diff saved to https://phabricator.wikimedia.org/P77666 and previous config saved to /var/cache/conftool/dbconfig/20250611-082039-marostegui.json
- 08:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 08:20 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
- 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77665 and previous config saved to /var/cache/conftool/dbconfig/20250611-082018-marostegui.json
- 08:19 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
- 08:18 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
- 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
- 08:15 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
- 08:15 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
- 08:14 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
- 08:13 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host netflow1003.eqiad.wmnet with OS bookworm
- 08:12 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host stat1008.eqiad.wmnet
- 08:11 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
- 08:10 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
- 08:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
- 08:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T395241)', diff saved to https://phabricator.wikimedia.org/P77664 and previous config saved to /var/cache/conftool/dbconfig/20250611-081031-fceratto.json
- 08:10 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
- 08:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow1003.eqiad.wmnet on all recursors
- 08:10 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache netflow1003.eqiad.wmnet on all recursors
- 08:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:09 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
- 08:09 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
- 08:09 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow1003.eqiad.wmnet - ayounsi@cumin1003"
- 08:07 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
- 08:07 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
- 08:07 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
- 08:05 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 08:05 ayounsi@cumin1003: START - Cookbook sre.ganeti.makevm for new host netflow1003.eqiad.wmnet
- 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P77662 and previous config saved to /var/cache/conftool/dbconfig/20250611-080511-marostegui.json
- 08:04 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
- 08:03 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
- 08:03 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
- 08:01 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T395241)', diff saved to https://phabricator.wikimedia.org/P77661 and previous config saved to /var/cache/conftool/dbconfig/20250611-080101-fceratto.json
- 08:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 07:59 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
- 07:59 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
- 07:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
- 07:57 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:56 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
- 07:53 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:53 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
- 07:52 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
- 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77660 and previous config saved to /var/cache/conftool/dbconfig/20250611-075240-root.json
- 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P77659 and previous config saved to /var/cache/conftool/dbconfig/20250611-075004-marostegui.json
- 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77658 and previous config saved to /var/cache/conftool/dbconfig/20250611-073733-root.json
- 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es2028 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77657 and previous config saved to /var/cache/conftool/dbconfig/20250611-073530-root.json
- 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77656 and previous config saved to /var/cache/conftool/dbconfig/20250611-073457-marostegui.json
- 07:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
- 07:33 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1031.eqiad.wmnet
- 07:31 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
- 07:31 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1031.eqiad.wmnet
- 07:27 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
- 07:24 slyngshede@dns1004: END - running authdns-update
- 07:24 slyngshede@dns1004: START - running authdns-update
- 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77655 and previous config saved to /var/cache/conftool/dbconfig/20250611-072227-root.json
- 07:22 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77654 and previous config saved to /var/cache/conftool/dbconfig/20250611-072024-root.json
- 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T396130)', diff saved to https://phabricator.wikimedia.org/P77653 and previous config saved to /var/cache/conftool/dbconfig/20250611-071612-marostegui.json
- 07:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77652 and previous config saved to /var/cache/conftool/dbconfig/20250611-071549-marostegui.json
- 07:10 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
- 07:09 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
- 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77651 and previous config saved to /var/cache/conftool/dbconfig/20250611-070722-root.json
- 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77650 and previous config saved to /var/cache/conftool/dbconfig/20250611-070519-root.json
- 07:03 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
- 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77649 and previous config saved to /var/cache/conftool/dbconfig/20250611-070117-root.json
- 07:00 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
- 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P77648 and previous config saved to /var/cache/conftool/dbconfig/20250611-070042-marostegui.json
- 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77647 and previous config saved to /var/cache/conftool/dbconfig/20250611-065217-root.json
- 06:50 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
- 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77646 and previous config saved to /var/cache/conftool/dbconfig/20250611-065013-root.json
- 06:49 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
- 06:49 jmm@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=1) rolling restart_daemons on A:wdqs-all
- 06:48 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
- 06:48 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
- 06:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2027.codfw.wmnet with reason: Maintenance
- 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77645 and previous config saved to /var/cache/conftool/dbconfig/20250611-064611-root.json
- 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2027 T395241', diff saved to https://phabricator.wikimedia.org/P77644 and previous config saved to /var/cache/conftool/dbconfig/20250611-064606-marostegui.json
- 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P77643 and previous config saved to /var/cache/conftool/dbconfig/20250611-064535-marostegui.json
- 06:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2028.codfw.wmnet with reason: Maintenance
- 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2028 T395241', diff saved to https://phabricator.wikimedia.org/P77642 and previous config saved to /var/cache/conftool/dbconfig/20250611-064314-marostegui.json
- 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77641 and previous config saved to /var/cache/conftool/dbconfig/20250611-064246-root.json
- 06:42 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
- 06:38 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
- 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2238 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77640 and previous config saved to /var/cache/conftool/dbconfig/20250611-063549-root.json
- 06:32 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
- 06:32 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
- 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P77639 and previous config saved to /var/cache/conftool/dbconfig/20250611-063059-root.json
- 06:30 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes in es7" (duration: 10m 06s)
- 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77638 and previous config saved to /var/cache/conftool/dbconfig/20250611-063027-marostegui.json
- 06:30 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
- 06:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77637 and previous config saved to /var/cache/conftool/dbconfig/20250611-062741-root.json
- 06:25 moritzm: installing libxml2 security updates
- 06:23 marostegui@deploy1003: marostegui: Continuing with sync
- 06:22 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes in es7" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2238 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77636 and previous config saved to /var/cache/conftool/dbconfig/20250611-062044-root.json
- 06:20 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes in es7"
- 06:19 marostegui: Starting es7 eqiad failover from es1039 to es1035 - T396550
- 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Pool es1039', diff saved to https://phabricator.wikimedia.org/P77635 and previous config saved to /var/cache/conftool/dbconfig/20250611-061901-marostegui.json
- 06:18 marostegui@dns1006: END - running authdns-update
- 06:17 marostegui@dns1006: START - running authdns-update
- 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1035 to es7 primary T396550', diff saved to https://phabricator.wikimedia.org/P77634 and previous config saved to /var/cache/conftool/dbconfig/20250611-061644-root.json
- 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77633 and previous config saved to /var/cache/conftool/dbconfig/20250611-061553-root.json
- 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1035 with weight 0 T396550', diff saved to https://phabricator.wikimedia.org/P77632 and previous config saved to /var/cache/conftool/dbconfig/20250611-061501-root.json
- 06:14 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes in es7 (T396550) (duration: 10m 03s)
- 06:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77631 and previous config saved to /var/cache/conftool/dbconfig/20250611-061242-marostegui.json
- 06:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77630 and previous config saved to /var/cache/conftool/dbconfig/20250611-061236-root.json
- 06:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 06:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T396130)', diff saved to https://phabricator.wikimedia.org/P77629 and previous config saved to /var/cache/conftool/dbconfig/20250611-061219-marostegui.json
- 06:07 marostegui@deploy1003: marostegui: Continuing with sync
- 06:06 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes in es7 (T396550) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repool es1040', diff saved to https://phabricator.wikimedia.org/P77628 and previous config saved to /var/cache/conftool/dbconfig/20250611-060552-marostegui.json
- 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2238 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77627 and previous config saved to /var/cache/conftool/dbconfig/20250611-060538-root.json
- 06:04 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes in es7 (T396550)
- 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repool es1040', diff saved to https://phabricator.wikimedia.org/P77626 and previous config saved to /var/cache/conftool/dbconfig/20250611-060413-marostegui.json
- 06:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T396550
- 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repool es1040', diff saved to https://phabricator.wikimedia.org/P77625 and previous config saved to /var/cache/conftool/dbconfig/20250611-060227-marostegui.json
- 06:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77624 and previous config saved to /var/cache/conftool/dbconfig/20250611-060048-root.json
- 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77623 and previous config saved to /var/cache/conftool/dbconfig/20250611-055730-root.json
- 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P77622 and previous config saved to /var/cache/conftool/dbconfig/20250611-055711-marostegui.json
- 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77621 and previous config saved to /var/cache/conftool/dbconfig/20250611-055705-root.json
- 05:52 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1233 T396549', diff saved to https://phabricator.wikimedia.org/P77620 and previous config saved to /var/cache/conftool/dbconfig/20250611-055222-marostegui.json
- 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2238 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77619 and previous config saved to /var/cache/conftool/dbconfig/20250611-055033-root.json
- 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77618 and previous config saved to /var/cache/conftool/dbconfig/20250611-054835-root.json
- 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77617 and previous config saved to /var/cache/conftool/dbconfig/20250611-054224-root.json
- 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P77616 and previous config saved to /var/cache/conftool/dbconfig/20250611-054204-marostegui.json
- 05:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1040.eqiad.wmnet with reason: Maintenance
- 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1040', diff saved to https://phabricator.wikimedia.org/P77615 and previous config saved to /var/cache/conftool/dbconfig/20250611-053903-marostegui.json
- 05:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2238 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77614 and previous config saved to /var/cache/conftool/dbconfig/20250611-053527-root.json
- 05:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2238.codfw.wmnet with reason: Maintenance
- 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2238 T396549', diff saved to https://phabricator.wikimedia.org/P77613 and previous config saved to /var/cache/conftool/dbconfig/20250611-052907-marostegui.json
- 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2229 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77612 and previous config saved to /var/cache/conftool/dbconfig/20250611-052719-root.json
- 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T396130)', diff saved to https://phabricator.wikimedia.org/P77611 and previous config saved to /var/cache/conftool/dbconfig/20250611-052657-marostegui.json
- 05:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2229.codfw.wmnet with reason: Maintenance
- 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2229 T396509', diff saved to https://phabricator.wikimedia.org/P77610 and previous config saved to /var/cache/conftool/dbconfig/20250611-051612-marostegui.json
- 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2214 to s6 primary T396509', diff saved to https://phabricator.wikimedia.org/P77609 and previous config saved to /var/cache/conftool/dbconfig/20250611-051525-marostegui.json
- 05:15 marostegui: Starting s6 codfw failover from db2229 to db2214 - T396509
- 05:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s6 T396509
- 05:10 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2214 with weight 0 T396509', diff saved to https://phabricator.wikimedia.org/P77608 and previous config saved to /var/cache/conftool/dbconfig/20250611-051056-root.json
- 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T396130)', diff saved to https://phabricator.wikimedia.org/P77607 and previous config saved to /var/cache/conftool/dbconfig/20250611-050911-marostegui.json
- 05:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 05:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 04:37 oblivian@deploy1003: Finished scap sync-world: Backport for robots.txt: add crawl-delay directive for semrushbot (duration: 11m 43s)
- 04:30 oblivian@deploy1003: oblivian: Continuing with sync
- 04:28 oblivian@deploy1003: oblivian: Backport for robots.txt: add crawl-delay directive for semrushbot synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 04:25 oblivian@deploy1003: Started scap sync-world: Backport for robots.txt: add crawl-delay directive for semrushbot
- 02:07 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 02:06 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 02:06 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 02:06 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 02:06 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 02:06 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 00:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 00:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T396130)', diff saved to https://phabricator.wikimedia.org/P77606 and previous config saved to /var/cache/conftool/dbconfig/20250611-001949-marostegui.json
- 00:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P77605 and previous config saved to /var/cache/conftool/dbconfig/20250611-000441-marostegui.json
2025-06-10
- 23:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P77604 and previous config saved to /var/cache/conftool/dbconfig/20250610-234934-marostegui.json
- 23:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T396130)', diff saved to https://phabricator.wikimedia.org/P77603 and previous config saved to /var/cache/conftool/dbconfig/20250610-233427-marostegui.json
- 23:25 krinkle@deploy1003: Finished scap sync-world: Backport for multiversion: Document how it all works (T289318) (duration: 12m 56s)
- 23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T396130)', diff saved to https://phabricator.wikimedia.org/P77602 and previous config saved to /var/cache/conftool/dbconfig/20250610-232206-marostegui.json
- 23:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1254.eqiad.wmnet with reason: Maintenance
- 23:18 krinkle@deploy1003: krinkle: Continuing with sync
- 23:14 krinkle@deploy1003: krinkle: Backport for multiversion: Document how it all works (T289318) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:12 krinkle@deploy1003: Started scap sync-world: Backport for multiversion: Document how it all works (T289318)
- 23:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 23:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T396130)', diff saved to https://phabricator.wikimedia.org/P77600 and previous config saved to /var/cache/conftool/dbconfig/20250610-231053-marostegui.json
- 22:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P77599 and previous config saved to /var/cache/conftool/dbconfig/20250610-225546-marostegui.json
- 22:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P77598 and previous config saved to /var/cache/conftool/dbconfig/20250610-224039-marostegui.json
- 22:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T396130)', diff saved to https://phabricator.wikimedia.org/P77597 and previous config saved to /var/cache/conftool/dbconfig/20250610-222532-marostegui.json
- 22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T396130)', diff saved to https://phabricator.wikimedia.org/P77596 and previous config saved to /var/cache/conftool/dbconfig/20250610-221311-marostegui.json
- 22:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 22:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T396130)', diff saved to https://phabricator.wikimedia.org/P77595 and previous config saved to /var/cache/conftool/dbconfig/20250610-221248-marostegui.json
- 21:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P77594 and previous config saved to /var/cache/conftool/dbconfig/20250610-215741-marostegui.json
- 21:51 catrope@deploy1003: Finished scap sync-world: Backport for Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370), Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370) (duration: 11m 20s)
- 21:44 catrope@deploy1003: catrope: Continuing with sync
- 21:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P77593 and previous config saved to /var/cache/conftool/dbconfig/20250610-214234-marostegui.json
- 21:42 catrope@deploy1003: catrope: Backport for Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370), Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:39 catrope@deploy1003: Started scap sync-world: Backport for Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370), Fixes TypeError: undefined is not an object (evaluating 'sources.map') (T396370)
- 21:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T396130)', diff saved to https://phabricator.wikimedia.org/P77592 and previous config saved to /var/cache/conftool/dbconfig/20250610-212727-marostegui.json
- 21:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T396130)', diff saved to https://phabricator.wikimedia.org/P77591 and previous config saved to /var/cache/conftool/dbconfig/20250610-212332-marostegui.json
- 21:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 21:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 21:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T396130)', diff saved to https://phabricator.wikimedia.org/P77590 and previous config saved to /var/cache/conftool/dbconfig/20250610-211234-marostegui.json
- 20:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P77588 and previous config saved to /var/cache/conftool/dbconfig/20250610-205727-marostegui.json
- 20:56 bking@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 20:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 20:55 bking@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 20:55 bking@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 20:55 bking@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 20:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 20:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P77587 and previous config saved to /var/cache/conftool/dbconfig/20250610-204220-marostegui.json
- 20:32 cjming@deploy1003: Finished scap sync-world: Backport for Replace deprecated wgCirrusSearchWMFExtraFeatures with wgCirrusSearchWeightedTags (T393872) (duration: 10m 18s)
- 20:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T396130)', diff saved to https://phabricator.wikimedia.org/P77586 and previous config saved to /var/cache/conftool/dbconfig/20250610-202713-marostegui.json
- 20:26 cjming@deploy1003: cjming, sd: Continuing with sync
- 20:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:24 cjming@deploy1003: cjming, sd: Backport for Replace deprecated wgCirrusSearchWMFExtraFeatures with wgCirrusSearchWeightedTags (T393872) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:24 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:24 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
- 20:24 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
- 20:23 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 20:22 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 20:22 cjming@deploy1003: Started scap sync-world: Backport for Replace deprecated wgCirrusSearchWMFExtraFeatures with wgCirrusSearchWeightedTags (T393872)
- 20:20 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 20:19 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:19 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
- 20:19 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1185 - vriley@cumin1002"
- 20:19 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 20:18 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 20:16 toyofuku@deploy1003: Finished scap sync-world: Backport for Enable empty search recommendations for Vector on all wikipedias, and for Minerva on group1 wikis and wikivoyage (T395344 T395339) (duration: 13m 01s)
- 20:16 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 20:15 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 20:15 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 20:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T396130)', diff saved to https://phabricator.wikimedia.org/P77585 and previous config saved to /var/cache/conftool/dbconfig/20250610-201441-marostegui.json
- 20:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 20:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T396130)', diff saved to https://phabricator.wikimedia.org/P77584 and previous config saved to /var/cache/conftool/dbconfig/20250610-201418-marostegui.json
- 20:11 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1185
- 20:11 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1185
- 20:11 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1185
- 20:10 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1185
- 20:09 toyofuku@deploy1003: bwang, toyofuku: Continuing with sync
- 20:06 toyofuku@deploy1003: bwang, toyofuku: Backport for Enable empty search recommendations for Vector on all wikipedias, and for Minerva on group1 wikis and wikivoyage (T395344 T395339) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:03 toyofuku@deploy1003: Started scap sync-world: Backport for Enable empty search recommendations for Vector on all wikipedias, and for Minerva on group1 wikis and wikivoyage (T395344 T395339)
- 19:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P77583 and previous config saved to /var/cache/conftool/dbconfig/20250610-195910-marostegui.json
- 19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P77582 and previous config saved to /var/cache/conftool/dbconfig/20250610-194403-marostegui.json
- 19:41 dwisehaupt@dns1004: END - running authdns-update
- 19:40 dwisehaupt@dns1004: START - running authdns-update
- 19:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T396130)', diff saved to https://phabricator.wikimedia.org/P77581 and previous config saved to /var/cache/conftool/dbconfig/20250610-192856-marostegui.json
- 19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T396130)', diff saved to https://phabricator.wikimedia.org/P77580 and previous config saved to /var/cache/conftool/dbconfig/20250610-192503-marostegui.json
- 19:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 19:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T396130)', diff saved to https://phabricator.wikimedia.org/P77579 and previous config saved to /var/cache/conftool/dbconfig/20250610-192441-marostegui.json
- 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P77578 and previous config saved to /var/cache/conftool/dbconfig/20250610-190934-marostegui.json
- 18:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P77577 and previous config saved to /var/cache/conftool/dbconfig/20250610-185426-marostegui.json
- 18:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T396130)', diff saved to https://phabricator.wikimedia.org/P77576 and previous config saved to /var/cache/conftool/dbconfig/20250610-183919-marostegui.json
- 18:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T396130)', diff saved to https://phabricator.wikimedia.org/P77575 and previous config saved to /var/cache/conftool/dbconfig/20250610-183528-marostegui.json
- 18:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 18:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T396130)', diff saved to https://phabricator.wikimedia.org/P77574 and previous config saved to /var/cache/conftool/dbconfig/20250610-183505-marostegui.json
- 18:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P77573 and previous config saved to /var/cache/conftool/dbconfig/20250610-181958-marostegui.json
- 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.5 refs T392175
- 18:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P77572 and previous config saved to /var/cache/conftool/dbconfig/20250610-180451-marostegui.json
- 18:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77571 and previous config saved to /var/cache/conftool/dbconfig/20250610-180333-root.json
- 17:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T396130)', diff saved to https://phabricator.wikimedia.org/P77570 and previous config saved to /var/cache/conftool/dbconfig/20250610-174944-marostegui.json
- 17:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77569 and previous config saved to /var/cache/conftool/dbconfig/20250610-174828-root.json
- 17:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T396130)', diff saved to https://phabricator.wikimedia.org/P77568 and previous config saved to /var/cache/conftool/dbconfig/20250610-173514-marostegui.json
- 17:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 17:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77567 and previous config saved to /var/cache/conftool/dbconfig/20250610-173450-marostegui.json
- 17:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77566 and previous config saved to /var/cache/conftool/dbconfig/20250610-173322-root.json
- 17:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P77565 and previous config saved to /var/cache/conftool/dbconfig/20250610-171943-marostegui.json
- 17:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77564 and previous config saved to /var/cache/conftool/dbconfig/20250610-171817-root.json
- 17:14 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:10 mszabo@deploy1003: Finished scap sync-world: Backport for Set ORESDeveloperSetup to false by default (T364705), ores: Disable AbuseFilter integration by default (T364705), tests: Run only defered updates on LinkRecommendationUpdaterTest (duration: 15m 06s)
- 17:09 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T395241)', diff saved to https://phabricator.wikimedia.org/P77563 and previous config saved to /var/cache/conftool/dbconfig/20250610-170543-fceratto.json
- 17:04 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P77562 and previous config saved to /var/cache/conftool/dbconfig/20250610-170437-marostegui.json
- 17:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77561 and previous config saved to /var/cache/conftool/dbconfig/20250610-170312-root.json
- 17:01 mszabo@deploy1003: mszabo: Continuing with sync
- 16:59 mszabo@deploy1003: mszabo: Backport for Set ORESDeveloperSetup to false by default (T364705), ores: Disable AbuseFilter integration by default (T364705), tests: Run only defered updates on LinkRecommendationUpdaterTest synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:55 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:55 mszabo@deploy1003: Started scap sync-world: Backport for Set ORESDeveloperSetup to false by default (T364705), ores: Disable AbuseFilter integration by default (T364705), tests: Run only defered updates on LinkRecommendationUpdaterTest
- 16:54 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:52 mszabo@deploy1003: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.45.0-wmf.4,1.45.0-wmf.5 --multiversion-image-name docker-registry.discovery.wmnet/restricted/mediawiki-multiversion --multiversion-debug-image-name docker-registry.discovery.w
- 16:51 mszabo@deploy1003: Started scap sync-world: Backport for Set ORESDeveloperSetup to false by default (T364705), ores: Disable AbuseFilter integration by default (T364705), tests: Run only defered updates on LinkRecommendationUpdaterTest
- 16:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P77560 and previous config saved to /var/cache/conftool/dbconfig/20250610-165036-fceratto.json
- 16:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77559 and previous config saved to /var/cache/conftool/dbconfig/20250610-164930-marostegui.json
- 16:49 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1003*,relforge1004* for testtesttest - bking@cumin2002 - T390565
- 16:49 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1003*,relforge1004* for testtesttest - bking@cumin2002 - T390565
- 16:48 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: relforge1003*relforg1004* for testtesttest - bking@cumin2002 - T390565
- 16:48 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1003*relforg1004* for testtesttest - bking@cumin2002 - T390565
- 16:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77558 and previous config saved to /var/cache/conftool/dbconfig/20250610-164806-root.json
- 16:42 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 16:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 16:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1201 T395989', diff saved to https://phabricator.wikimedia.org/P77557 and previous config saved to /var/cache/conftool/dbconfig/20250610-163742-marostegui.json
- 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P77556 and previous config saved to /var/cache/conftool/dbconfig/20250610-163529-fceratto.json
- 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T396130)', diff saved to https://phabricator.wikimedia.org/P77555 and previous config saved to /var/cache/conftool/dbconfig/20250610-163458-marostegui.json
- 16:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 16:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 16:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2207.codfw.wmnet with reason: Maintenance
- 16:21 dancy@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.5 refs T392175 (duration: 44m 02s)
- 16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T395241)', diff saved to https://phabricator.wikimedia.org/P77554 and previous config saved to /var/cache/conftool/dbconfig/20250610-162022-fceratto.json
- 16:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1253 (T395241)', diff saved to https://phabricator.wikimedia.org/P77553 and previous config saved to /var/cache/conftool/dbconfig/20250610-161323-fceratto.json
- 16:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1253.eqiad.wmnet with reason: Maintenance
- 16:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T395241)', diff saved to https://phabricator.wikimedia.org/P77552 and previous config saved to /var/cache/conftool/dbconfig/20250610-161258-fceratto.json
- 16:12 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 16:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T396130)', diff saved to https://phabricator.wikimedia.org/P77551 and previous config saved to /var/cache/conftool/dbconfig/20250610-160804-marostegui.json
- 16:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install7002.wikimedia.org with OS bookworm
- 15:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P77550 and previous config saved to /var/cache/conftool/dbconfig/20250610-155752-fceratto.json
- 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P77549 and previous config saved to /var/cache/conftool/dbconfig/20250610-155257-marostegui.json
- 15:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install7002.wikimedia.org with reason: host reimage
- 15:43 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on install7002.wikimedia.org with reason: host reimage
- 15:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P77548 and previous config saved to /var/cache/conftool/dbconfig/20250610-154245-fceratto.json
- 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P77547 and previous config saved to /var/cache/conftool/dbconfig/20250610-153750-marostegui.json
- 15:37 dancy@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.5 refs T392175
- 15:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T395241)', diff saved to https://phabricator.wikimedia.org/P77546 and previous config saved to /var/cache/conftool/dbconfig/20250610-152738-fceratto.json
- 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T396130)', diff saved to https://phabricator.wikimedia.org/P77545 and previous config saved to /var/cache/conftool/dbconfig/20250610-152243-marostegui.json
- 15:14 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1002.eqiad.wmnet
- 15:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T396130)', diff saved to https://phabricator.wikimedia.org/P77544 and previous config saved to /var/cache/conftool/dbconfig/20250610-150954-marostegui.json
- 15:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2238.codfw.wmnet with reason: Maintenance
- 15:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T396130)', diff saved to https://phabricator.wikimedia.org/P77543 and previous config saved to /var/cache/conftool/dbconfig/20250610-150931-marostegui.json
- 15:08 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: deploy phab1004 for T396490 (duration: 00m 39s)
- 15:08 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: deploy phab1004 for T396490
- 15:08 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1002.eqiad.wmnet
- 15:07 brennen@deploy1003: Finished deploy [phabricator/deployment@f8d7b38]: test deploy phab2002 for T396490 (duration: 00m 40s)
- 15:07 klausman@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-lab1001.eqiad.wmnet
- 15:07 brennen@deploy1003: Started deploy [phabricator/deployment@f8d7b38]: test deploy phab2002 for T396490
- 15:07 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phorge Update
- 15:05 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phorge Update
- 15:02 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host install7002.wikimedia.org with OS bookworm
- 15:02 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
- 15:02 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
- 15:01 klausman@cumin1003: START - Cookbook sre.hosts.reboot-single for host ml-lab1001.eqiad.wmnet
- 15:01 taavi@dns1004: END - running authdns-update
- 15:00 taavi@dns1004: START - running authdns-update
- 14:58 taavi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:58 taavi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add wiki replica cloudlb v6 addresses - taavi@cumin1003"
- 14:58 taavi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add wiki replica cloudlb v6 addresses - taavi@cumin1003"
- 14:56 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
- 14:55 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
- 14:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P77542 and previous config saved to /var/cache/conftool/dbconfig/20250610-145424-marostegui.json
- 14:54 taavi@cumin1003: START - Cookbook sre.dns.netbox
- 14:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:53 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc1 T378715', diff saved to https://phabricator.wikimedia.org/P77541 and previous config saved to /var/cache/conftool/dbconfig/20250610-145137-marostegui.json
- 14:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cirrussearch1063.eqiad.wmnet
- 14:49 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:49 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch1063.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 14:49 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch1063.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P77539 and previous config saved to /var/cache/conftool/dbconfig/20250610-143917-marostegui.json
- 14:36 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T395241)', diff saved to https://phabricator.wikimedia.org/P77538 and previous config saved to /var/cache/conftool/dbconfig/20250610-143623-fceratto.json
- 14:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 14:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T395241)', diff saved to https://phabricator.wikimedia.org/P77537 and previous config saved to /var/cache/conftool/dbconfig/20250610-143558-fceratto.json
- 14:29 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host install7002.wikimedia.org with OS bullseye
- 14:28 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cirrussearch1063.eqiad.wmnet
- 14:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T396130)', diff saved to https://phabricator.wikimedia.org/P77536 and previous config saved to /var/cache/conftool/dbconfig/20250610-142410-marostegui.json
- 14:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P77535 and previous config saved to /var/cache/conftool/dbconfig/20250610-142051-fceratto.json
- 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2226 (T396130)', diff saved to https://phabricator.wikimedia.org/P77534 and previous config saved to /var/cache/conftool/dbconfig/20250610-142009-marostegui.json
- 14:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2226.codfw.wmnet with reason: Maintenance
- 14:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T396130)', diff saved to https://phabricator.wikimedia.org/P77533 and previous config saved to /var/cache/conftool/dbconfig/20250610-141946-marostegui.json
- 14:19 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1028.eqiad.wmnet
- 14:13 fabfur@dns1004: END - running authdns-update
- 14:13 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
- 14:13 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1028.eqiad.wmnet
- 14:12 fabfur@dns1004: START - running authdns-update
- 14:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P77532 and previous config saved to /var/cache/conftool/dbconfig/20250610-140544-fceratto.json
- 14:04 taavi@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
- 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P77531 and previous config saved to /var/cache/conftool/dbconfig/20250610-140439-marostegui.json
- 13:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
- 13:56 taavi@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
- 13:55 taavi@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
- 13:51 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 13:51 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 13:51 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 13:50 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 13:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
- 13:50 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
- 13:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T395241)', diff saved to https://phabricator.wikimedia.org/P77529 and previous config saved to /var/cache/conftool/dbconfig/20250610-135037-fceratto.json
- 13:50 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 13:50 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 13:49 fabfur@dns1004: END - running authdns-update
- 13:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P77528 and previous config saved to /var/cache/conftool/dbconfig/20250610-134931-marostegui.json
- 13:48 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 13:48 fabfur@dns1004: START - running authdns-update
- 13:48 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 13:47 taavi@cumin1003: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
- 13:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
- 13:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T395241)', diff saved to https://phabricator.wikimedia.org/P77527 and previous config saved to /var/cache/conftool/dbconfig/20250610-134227-fceratto.json
- 13:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 13:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T395241)', diff saved to https://phabricator.wikimedia.org/P77526 and previous config saved to /var/cache/conftool/dbconfig/20250610-134202-fceratto.json
- 13:39 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
- 13:39 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
- 13:38 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
- 13:38 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:36 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T396130)', diff saved to https://phabricator.wikimedia.org/P77525 and previous config saved to /var/cache/conftool/dbconfig/20250610-133424-marostegui.json
- 13:34 sgimeno@deploy1003: Finished scap sync-world: Backport for Enable electionclerk user group on enwiki (T378287), core-Permissions:Restrict editing on cawikimedia to autoconfirmed only (T396178) (duration: 11m 22s)
- 13:33 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 13:32 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
- 13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc1 T378715', diff saved to https://phabricator.wikimedia.org/P77524 and previous config saved to /var/cache/conftool/dbconfig/20250610-133207-marostegui.json
- 13:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2011.codfw.wmnet,pc1011.eqiad.wmnet with reason: Maintenance
- 13:30 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
- 13:27 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
- 13:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
- 13:27 sgimeno@deploy1003: bunnypranav, dreamrimmer, sgimeno: Continuing with sync
- 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P77523 and previous config saved to /var/cache/conftool/dbconfig/20250610-132655-fceratto.json
- 13:25 sgimeno@deploy1003: bunnypranav, dreamrimmer, sgimeno: Backport for Enable electionclerk user group on enwiki (T378287), core-Permissions:Restrict editing on cawikimedia to autoconfirmed only (T396178) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:22 sgimeno@deploy1003: Started scap sync-world: Backport for Enable electionclerk user group on enwiki (T378287), core-Permissions:Restrict editing on cawikimedia to autoconfirmed only (T396178)
- 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T396130)', diff saved to https://phabricator.wikimedia.org/P77522 and previous config saved to /var/cache/conftool/dbconfig/20250610-132124-marostegui.json
- 13:21 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
- 13:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2225.codfw.wmnet with reason: Maintenance
- 13:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T396130)', diff saved to https://phabricator.wikimedia.org/P77521 and previous config saved to /var/cache/conftool/dbconfig/20250610-132102-marostegui.json
- 13:18 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host install7002.wikimedia.org with OS bullseye
- 13:18 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
- 13:17 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host install7002.wikimedia.org with OS bullseye
- 13:15 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
- 13:15 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
- 13:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P77520 and previous config saved to /var/cache/conftool/dbconfig/20250610-131148-fceratto.json
- 13:09 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
- 13:06 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
- 13:06 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1024.eqiad.wmnet
- 13:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P77519 and previous config saved to /var/cache/conftool/dbconfig/20250610-130555-marostegui.json
- 13:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T395241)', diff saved to https://phabricator.wikimedia.org/P77518 and previous config saved to /var/cache/conftool/dbconfig/20250610-125641-fceratto.json
- 12:54 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:54 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 12:54 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host install7002.wikimedia.org with OS bullseye
- 12:53 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host install7002.wikimedia.org with OS bullseye
- 12:53 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:53 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 12:53 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 12:52 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:52 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 12:52 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P77517 and previous config saved to /var/cache/conftool/dbconfig/20250610-125048-marostegui.json
- 12:48 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host install7002.wikimedia.org with OS bullseye
- 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T395241)', diff saved to https://phabricator.wikimedia.org/P77516 and previous config saved to /var/cache/conftool/dbconfig/20250610-124835-fceratto.json
- 12:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T395241)', diff saved to https://phabricator.wikimedia.org/P77515 and previous config saved to /var/cache/conftool/dbconfig/20250610-124810-fceratto.json
- 12:47 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
- 12:41 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
- 12:41 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
- 12:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T396130)', diff saved to https://phabricator.wikimedia.org/P77514 and previous config saved to /var/cache/conftool/dbconfig/20250610-123541-marostegui.json
- 12:35 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
- 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77513 and previous config saved to /var/cache/conftool/dbconfig/20250610-123422-root.json
- 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P77512 and previous config saved to /var/cache/conftool/dbconfig/20250610-123303-fceratto.json
- 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T396130)', diff saved to https://phabricator.wikimedia.org/P77511 and previous config saved to /var/cache/conftool/dbconfig/20250610-123140-marostegui.json
- 12:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2204.codfw.wmnet with reason: Maintenance
- 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77510 and previous config saved to /var/cache/conftool/dbconfig/20250610-123117-marostegui.json
- 12:27 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
- 12:19 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) openstack.eqiad1.wikimediacloud.org on all recursors
- 12:19 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache openstack.eqiad1.wikimediacloud.org on all recursors
- 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77509 and previous config saved to /var/cache/conftool/dbconfig/20250610-121917-root.json
- 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P77508 and previous config saved to /var/cache/conftool/dbconfig/20250610-121756-fceratto.json
- 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P77507 and previous config saved to /var/cache/conftool/dbconfig/20250610-121610-marostegui.json
- 12:15 Ammar: Ran fixStuckGlobalRename.php for T396371 and T396452
- 12:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:13 taavi@dns1004: END - running authdns-update
- 12:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:12 taavi@dns1004: START - running authdns-update
- 12:11 taavi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:11 taavi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add AAAA record for openstack.eqiad1.wikimediacloud.org - taavi@cumin1003"
- 12:10 taavi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add AAAA record for openstack.eqiad1.wikimediacloud.org - taavi@cumin1003"
- 12:06 taavi@cumin1003: START - Cookbook sre.dns.netbox
- 12:06 taavi@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 12:06 jmm@dns1004: END - running authdns-update
- 12:05 jmm@dns1004: START - running authdns-update
- 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77506 and previous config saved to /var/cache/conftool/dbconfig/20250610-120412-root.json
- 12:03 taavi@cumin1003: START - Cookbook sre.dns.netbox
- 12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T395241)', diff saved to https://phabricator.wikimedia.org/P77505 and previous config saved to /var/cache/conftool/dbconfig/20250610-120249-fceratto.json
- 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P77504 and previous config saved to /var/cache/conftool/dbconfig/20250610-120103-marostegui.json
- 11:59 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host install7002.wikimedia.org
- 11:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T395241)', diff saved to https://phabricator.wikimedia.org/P77503 and previous config saved to /var/cache/conftool/dbconfig/20250610-115444-fceratto.json
- 11:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 11:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T395241)', diff saved to https://phabricator.wikimedia.org/P77502 and previous config saved to /var/cache/conftool/dbconfig/20250610-115419-fceratto.json
- 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77501 and previous config saved to /var/cache/conftool/dbconfig/20250610-114906-root.json
- 11:48 moritzm: installing qemu bugfix updates
- 11:47 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 11:47 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host install7002.wikimedia.org
- 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77500 and previous config saved to /var/cache/conftool/dbconfig/20250610-114617-root.json
- 11:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77499 and previous config saved to /var/cache/conftool/dbconfig/20250610-114556-marostegui.json
- 11:44 cgoubert@deploy1003: Finished scap sync-world: mediawiki-cli: Fix the paths of some of the dumps scripts and config files - T394389 (duration: 08m 49s)
- 11:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P77497 and previous config saved to /var/cache/conftool/dbconfig/20250610-113913-fceratto.json
- 11:37 moritzm: failover Ganeti master in codfw to ganeti2032
- 11:35 cgoubert@deploy1003: Started scap sync-world: mediawiki-cli: Fix the paths of some of the dumps scripts and config files - T394389
- 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77495 and previous config saved to /var/cache/conftool/dbconfig/20250610-113401-root.json
- 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T396130)', diff saved to https://phabricator.wikimedia.org/P77494 and previous config saved to /var/cache/conftool/dbconfig/20250610-113328-marostegui.json
- 11:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77493 and previous config saved to /var/cache/conftool/dbconfig/20250610-113306-marostegui.json
- 11:31 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2044.codfw.wmnet
- 11:31 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2044.codfw.wmnet
- 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77492 and previous config saved to /var/cache/conftool/dbconfig/20250610-113112-root.json
- 11:26 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2044.codfw.wmnet
- 11:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P77491 and previous config saved to /var/cache/conftool/dbconfig/20250610-112406-fceratto.json
- 11:21 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2044.codfw.wmnet
- 11:21 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2043.codfw.wmnet
- 11:20 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2043.codfw.wmnet
- 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77490 and previous config saved to /var/cache/conftool/dbconfig/20250610-111856-root.json
- 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P77489 and previous config saved to /var/cache/conftool/dbconfig/20250610-111759-marostegui.json
- 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77488 and previous config saved to /var/cache/conftool/dbconfig/20250610-111606-root.json
- 11:15 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2043.codfw.wmnet
- 11:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1168 T395989', diff saved to https://phabricator.wikimedia.org/P77487 and previous config saved to /var/cache/conftool/dbconfig/20250610-111440-marostegui.json
- 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77486 and previous config saved to /var/cache/conftool/dbconfig/20250610-111054-root.json
- 11:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T395241)', diff saved to https://phabricator.wikimedia.org/P77485 and previous config saved to /var/cache/conftool/dbconfig/20250610-110859-fceratto.json
- 11:04 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 11:04 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P77484 and previous config saved to /var/cache/conftool/dbconfig/20250610-110252-marostegui.json
- 11:01 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 11:01 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77483 and previous config saved to /var/cache/conftool/dbconfig/20250610-110101-root.json
- 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T395241)', diff saved to https://phabricator.wikimedia.org/P77482 and previous config saved to /var/cache/conftool/dbconfig/20250610-105951-fceratto.json
- 10:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77481 and previous config saved to /var/cache/conftool/dbconfig/20250610-105926-fceratto.json
- 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77480 and previous config saved to /var/cache/conftool/dbconfig/20250610-105911-root.json
- 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77479 and previous config saved to /var/cache/conftool/dbconfig/20250610-105548-root.json
- 10:53 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2043.codfw.wmnet
- 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77478 and previous config saved to /var/cache/conftool/dbconfig/20250610-104745-marostegui.json
- 10:47 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2042.codfw.wmnet
- 10:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet
- 10:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77477 and previous config saved to /var/cache/conftool/dbconfig/20250610-104556-root.json
- 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77476 and previous config saved to /var/cache/conftool/dbconfig/20250610-104449-root.json
- 10:44 taavi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'private.eqiad.wikimedia.cloud$' on eqiad recursors
- 10:44 taavi@cumin1003: START - Cookbook sre.dns.wipe-cache 'private.eqiad.wikimedia.cloud$' on eqiad recursors
- 10:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P77475 and previous config saved to /var/cache/conftool/dbconfig/20250610-104419-fceratto.json
- 10:44 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77474 and previous config saved to /var/cache/conftool/dbconfig/20250610-104406-root.json
- 10:43 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:42 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 10:42 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:42 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:42 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change AAAA records for eqiad cloudsw cloud-private GW IRB address - cmooney@cumin1003"
- 10:42 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change AAAA records for eqiad cloudsw cloud-private GW IRB address - cmooney@cumin1003"
- 10:42 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 10:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 10:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1180 T395989', diff saved to https://phabricator.wikimedia.org/P77473 and previous config saved to /var/cache/conftool/dbconfig/20250610-104143-marostegui.json
- 10:41 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet
- 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77472 and previous config saved to /var/cache/conftool/dbconfig/20250610-104043-root.json
- 10:39 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 10:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2042.codfw.wmnet
- 10:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T396130)', diff saved to https://phabricator.wikimedia.org/P77471 and previous config saved to /var/cache/conftool/dbconfig/20250610-103315-marostegui.json
- 10:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T396130)', diff saved to https://phabricator.wikimedia.org/P77470 and previous config saved to /var/cache/conftool/dbconfig/20250610-103252-marostegui.json
- 10:31 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2041.codfw.wmnet
- 10:31 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2041.codfw.wmnet
- 10:31 taavi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'private.eqiad.wikimedia.cloud$' on eqiad recursors
- 10:31 taavi@cumin1003: START - Cookbook sre.dns.wipe-cache 'private.eqiad.wikimedia.cloud$' on eqiad recursors
- 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:31 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records for WMCS cloud-private IPs in eqiad - cmooney@cumin1003"
- 10:31 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add AAAA records for WMCS cloud-private IPs in eqiad - cmooney@cumin1003"
- 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77469 and previous config saved to /var/cache/conftool/dbconfig/20250610-102943-root.json
- 10:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P77468 and previous config saved to /var/cache/conftool/dbconfig/20250610-102913-fceratto.json
- 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77467 and previous config saved to /var/cache/conftool/dbconfig/20250610-102900-root.json
- 10:27 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 10:26 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2041.codfw.wmnet
- 10:25 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77466 and previous config saved to /var/cache/conftool/dbconfig/20250610-102538-root.json
- 10:22 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 10:17 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2041.codfw.wmnet
- 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P77465 and previous config saved to /var/cache/conftool/dbconfig/20250610-101745-marostegui.json
- 10:17 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2040.codfw.wmnet
- 10:17 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2040.codfw.wmnet
- 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77464 and previous config saved to /var/cache/conftool/dbconfig/20250610-101438-root.json
- 10:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77463 and previous config saved to /var/cache/conftool/dbconfig/20250610-101406-fceratto.json
- 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77462 and previous config saved to /var/cache/conftool/dbconfig/20250610-101355-root.json
- 10:12 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2040.codfw.wmnet
- 10:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77461 and previous config saved to /var/cache/conftool/dbconfig/20250610-101032-root.json
- 10:08 moritzm: installing jinja2 security updates
- 10:08 moritzm: installing ninja2 security updates
- 10:06 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2040.codfw.wmnet
- 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T395241)', diff saved to https://phabricator.wikimedia.org/P77460 and previous config saved to /var/cache/conftool/dbconfig/20250610-100558-fceratto.json
- 10:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77459 and previous config saved to /var/cache/conftool/dbconfig/20250610-100532-fceratto.json
- 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P77458 and previous config saved to /var/cache/conftool/dbconfig/20250610-100239-marostegui.json
- 10:01 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2039.codfw.wmnet
- 10:00 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2039.codfw.wmnet
- 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77457 and previous config saved to /var/cache/conftool/dbconfig/20250610-095933-root.json
- 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77456 and previous config saved to /var/cache/conftool/dbconfig/20250610-095850-root.json
- 09:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:55 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2039.codfw.wmnet
- 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77455 and previous config saved to /var/cache/conftool/dbconfig/20250610-095527-root.json
- 09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2033.codfw.wmnet
- 09:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 09:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P77454 and previous config saved to /var/cache/conftool/dbconfig/20250610-095025-fceratto.json
- 09:50 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2039.codfw.wmnet
- 09:49 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2038.codfw.wmnet
- 09:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2038.codfw.wmnet
- 09:48 fnegri@cumin1003: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet
- 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T396130)', diff saved to https://phabricator.wikimedia.org/P77453 and previous config saved to /var/cache/conftool/dbconfig/20250610-094731-marostegui.json
- 09:46 moritzm: installing postgresql-15 security updates
- 09:45 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2033 - Upgrading es2033.codfw.wmnet
- 09:45 marostegui@cumin1002: START - Cookbook sre.mysql.depool es2033 - Upgrading es2033.codfw.wmnet
- 09:44 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for es2033.codfw.wmnet
- 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77450 and previous config saved to /var/cache/conftool/dbconfig/20250610-094429-root.json
- 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2031 to es2 master T395241', diff saved to https://phabricator.wikimedia.org/P77449 and previous config saved to /var/cache/conftool/dbconfig/20250610-094401-root.json
- 09:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2038.codfw.wmnet
- 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77448 and previous config saved to /var/cache/conftool/dbconfig/20250610-094345-root.json
- 09:43 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2032.codfw.wmnet
- 09:39 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1187 T395989', diff saved to https://phabricator.wikimedia.org/P77447 and previous config saved to /var/cache/conftool/dbconfig/20250610-093846-marostegui.json
- 09:38 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 09:37 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 09:37 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2032 - Upgrading es2032.codfw.wmnet
- 09:37 marostegui@cumin1002: START - Cookbook sre.mysql.depool es2032 - Upgrading es2032.codfw.wmnet
- 09:36 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for es2032.codfw.wmnet
- 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2030 to es1 master T395241', diff saved to https://phabricator.wikimedia.org/P77445 and previous config saved to /var/cache/conftool/dbconfig/20250610-093628-root.json
- 09:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P77444 and previous config saved to /var/cache/conftool/dbconfig/20250610-093518-fceratto.json
- 09:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2038.codfw.wmnet
- 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T396130)', diff saved to https://phabricator.wikimedia.org/P77443 and previous config saved to /var/cache/conftool/dbconfig/20250610-093252-marostegui.json
- 09:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 09:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 09:31 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:27 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:26 fnegri@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1019.eqiad.wmnet with reason: Upgrading clouddbs T394372
- 09:26 jynus: upgrade db2197 to MariaDB 10.11 T394487
- 09:24 fnegri@cumin1003: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet
- 09:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77442 and previous config saved to /var/cache/conftool/dbconfig/20250610-092011-fceratto.json
- 09:17 jynus@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet,dbprov2003.codfw.wmnet with reason: Downtime hosts for MariaDB 10.11 upgrade
- 09:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T395241)', diff saved to https://phabricator.wikimedia.org/P77441 and previous config saved to /var/cache/conftool/dbconfig/20250610-091040-fceratto.json
- 09:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 09:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T395241)', diff saved to https://phabricator.wikimedia.org/P77440 and previous config saved to /var/cache/conftool/dbconfig/20250610-091016-fceratto.json
- 09:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 09:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 09:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T396130)', diff saved to https://phabricator.wikimedia.org/P77439 and previous config saved to /var/cache/conftool/dbconfig/20250610-090635-marostegui.json
- 08:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P77438 and previous config saved to /var/cache/conftool/dbconfig/20250610-085508-fceratto.json
- 08:54 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:53 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:53 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P77437 and previous config saved to /var/cache/conftool/dbconfig/20250610-085128-marostegui.json
- 08:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P77436 and previous config saved to /var/cache/conftool/dbconfig/20250610-084002-fceratto.json
- 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P77435 and previous config saved to /var/cache/conftool/dbconfig/20250610-083622-marostegui.json
- 08:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T395241)', diff saved to https://phabricator.wikimedia.org/P77434 and previous config saved to /var/cache/conftool/dbconfig/20250610-082454-fceratto.json
- 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T396130)', diff saved to https://phabricator.wikimedia.org/P77433 and previous config saved to /var/cache/conftool/dbconfig/20250610-082114-marostegui.json
- 08:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T396130)', diff saved to https://phabricator.wikimedia.org/P77432 and previous config saved to /var/cache/conftool/dbconfig/20250610-081817-marostegui.json
- 08:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T396130)', diff saved to https://phabricator.wikimedia.org/P77431 and previous config saved to /var/cache/conftool/dbconfig/20250610-081756-marostegui.json
- 08:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T395241)', diff saved to https://phabricator.wikimedia.org/P77430 and previous config saved to /var/cache/conftool/dbconfig/20250610-081647-fceratto.json
- 08:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 08:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32098
- 08:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 32098
- 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P77429 and previous config saved to /var/cache/conftool/dbconfig/20250610-080248-marostegui.json
- 08:01 jynus: deploying grants for zuul backups @ m1 T394844
- 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P77428 and previous config saved to /var/cache/conftool/dbconfig/20250610-074742-marostegui.json
- 07:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77427 and previous config saved to /var/cache/conftool/dbconfig/20250610-073631-root.json
- 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T396130)', diff saved to https://phabricator.wikimedia.org/P77426 and previous config saved to /var/cache/conftool/dbconfig/20250610-073234-marostegui.json
- 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T396130)', diff saved to https://phabricator.wikimedia.org/P77425 and previous config saved to /var/cache/conftool/dbconfig/20250610-073003-marostegui.json
- 07:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T396130)', diff saved to https://phabricator.wikimedia.org/P77424 and previous config saved to /var/cache/conftool/dbconfig/20250610-072941-marostegui.json
- 07:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8849
- 07:28 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 8849
- 07:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28173
- 07:26 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28173
- 07:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 60427
- 07:25 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 60427
- 07:24 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10310
- 07:21 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77423 and previous config saved to /var/cache/conftool/dbconfig/20250610-072125-root.json
- 07:21 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 10310
- 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P77422 and previous config saved to /var/cache/conftool/dbconfig/20250610-071434-marostegui.json
- 07:14 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install7002.wikimedia.org
- 07:14 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install7002.wikimedia.org with OS bookworm
- 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77421 and previous config saved to /var/cache/conftool/dbconfig/20250610-070620-root.json
- 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77420 and previous config saved to /var/cache/conftool/dbconfig/20250610-070240-root.json
- 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P77419 and previous config saved to /var/cache/conftool/dbconfig/20250610-065927-marostegui.json
- 06:57 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install7002.wikimedia.org with reason: host reimage
- 06:53 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on install7002.wikimedia.org with reason: host reimage
- 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77418 and previous config saved to /var/cache/conftool/dbconfig/20250610-065303-root.json
- 06:52 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts bast7001.wikimedia.org
- 06:52 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 06:52 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 06:52 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: bast7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 06:51 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77417 and previous config saved to /var/cache/conftool/dbconfig/20250610-065114-root.json
- 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77416 and previous config saved to /var/cache/conftool/dbconfig/20250610-064735-root.json
- 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77415 and previous config saved to /var/cache/conftool/dbconfig/20250610-064615-root.json
- 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T396130)', diff saved to https://phabricator.wikimedia.org/P77414 and previous config saved to /var/cache/conftool/dbconfig/20250610-064420-marostegui.json
- 06:39 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77413 and previous config saved to /var/cache/conftool/dbconfig/20250610-063757-root.json
- 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77412 and previous config saved to /var/cache/conftool/dbconfig/20250610-063608-root.json
- 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T396130)', diff saved to https://phabricator.wikimedia.org/P77411 and previous config saved to /var/cache/conftool/dbconfig/20250610-063547-marostegui.json
- 06:35 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 06:35 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2031.codfw.wmnet
- 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T396130)', diff saved to https://phabricator.wikimedia.org/P77410 and previous config saved to /var/cache/conftool/dbconfig/20250610-063524-marostegui.json
- 06:35 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts bast7001.wikimedia.org
- 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77409 and previous config saved to /var/cache/conftool/dbconfig/20250610-063229-root.json
- 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77408 and previous config saved to /var/cache/conftool/dbconfig/20250610-063110-root.json
- 06:28 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host install7002.wikimedia.org with OS bookworm
- 06:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install7002.wikimedia.org - jmm@cumin1003"
- 06:25 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install7002.wikimedia.org - jmm@cumin1003"
- 06:25 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2031 - Upgrading es2031.codfw.wmnet
- 06:25 marostegui@cumin1002: START - Cookbook sre.mysql.depool es2031 - Upgrading es2031.codfw.wmnet
- 06:25 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install7002.wikimedia.org on all recursors
- 06:25 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache install7002.wikimedia.org on all recursors
- 06:25 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 06:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install7002.wikimedia.org - jmm@cumin1003"
- 06:25 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install7002.wikimedia.org - jmm@cumin1003"
- 06:25 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for es2031.codfw.wmnet
- 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2031', diff saved to https://phabricator.wikimedia.org/P77407 and previous config saved to /var/cache/conftool/dbconfig/20250610-062501-marostegui.json
- 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77406 and previous config saved to /var/cache/conftool/dbconfig/20250610-062252-root.json
- 06:21 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 06:21 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host install7002.wikimedia.org
- 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P77405 and previous config saved to /var/cache/conftool/dbconfig/20250610-062017-marostegui.json
- 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77404 and previous config saved to /var/cache/conftool/dbconfig/20250610-061724-root.json
- 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77403 and previous config saved to /var/cache/conftool/dbconfig/20250610-061604-root.json
- 06:07 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77402 and previous config saved to /var/cache/conftool/dbconfig/20250610-060746-root.json
- 06:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77401 and previous config saved to /var/cache/conftool/dbconfig/20250610-060638-root.json
- 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P77400 and previous config saved to /var/cache/conftool/dbconfig/20250610-060510-marostegui.json
- 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77399 and previous config saved to /var/cache/conftool/dbconfig/20250610-060218-root.json
- 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77398 and previous config saved to /var/cache/conftool/dbconfig/20250610-060059-root.json
- 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77397 and previous config saved to /var/cache/conftool/dbconfig/20250610-055241-root.json
- 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77396 and previous config saved to /var/cache/conftool/dbconfig/20250610-055132-root.json
- 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T396130)', diff saved to https://phabricator.wikimedia.org/P77395 and previous config saved to /var/cache/conftool/dbconfig/20250610-055003-marostegui.json
- 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77394 and previous config saved to /var/cache/conftool/dbconfig/20250610-054713-root.json
- 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T396130)', diff saved to https://phabricator.wikimedia.org/P77393 and previous config saved to /var/cache/conftool/dbconfig/20250610-054705-marostegui.json
- 05:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 05:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 05:46 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for es2030.codfw.wmnet
- 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T396130)', diff saved to https://phabricator.wikimedia.org/P77392 and previous config saved to /var/cache/conftool/dbconfig/20250610-054635-marostegui.json
- 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77391 and previous config saved to /var/cache/conftool/dbconfig/20250610-054554-root.json
- 05:39 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) es2030 - Upgrading es2030.codfw.wmnet
- 05:39 marostegui@cumin1002: START - Cookbook sre.mysql.depool es2030 - Upgrading es2030.codfw.wmnet
- 05:39 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for es2030.codfw.wmnet
- 05:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2030.codfw.wmnet with reason: Maintenance
- 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P77390 and previous config saved to /var/cache/conftool/dbconfig/20250610-053902-marostegui.json
- 05:37 marostegui@cumin1002: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77389 and previous config saved to /var/cache/conftool/dbconfig/20250610-053735-root.json
- 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77388 and previous config saved to /var/cache/conftool/dbconfig/20250610-053627-root.json
- 05:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2034.codfw.wmnet with reason: Maintenance
- 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P77387 and previous config saved to /var/cache/conftool/dbconfig/20250610-053128-marostegui.json
- 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2034', diff saved to https://phabricator.wikimedia.org/P77386 and previous config saved to /var/cache/conftool/dbconfig/20250610-053119-marostegui.json
- 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77385 and previous config saved to /var/cache/conftool/dbconfig/20250610-053048-root.json
- 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1036', diff saved to https://phabricator.wikimedia.org/P77384 and previous config saved to /var/cache/conftool/dbconfig/20250610-052155-marostegui.json
- 05:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1036.eqiad.wmnet with reason: Maintenance
- 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77383 and previous config saved to /var/cache/conftool/dbconfig/20250610-052122-root.json
- 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P77382 and previous config saved to /var/cache/conftool/dbconfig/20250610-051614-marostegui.json
- 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P77381 and previous config saved to /var/cache/conftool/dbconfig/20250610-050616-root.json
- 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1231', diff saved to https://phabricator.wikimedia.org/P77380 and previous config saved to /var/cache/conftool/dbconfig/20250610-050215-marostegui.json
- 05:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 05:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T396130)', diff saved to https://phabricator.wikimedia.org/P77379 and previous config saved to /var/cache/conftool/dbconfig/20250610-050107-marostegui.json
- 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1159 (T396130)', diff saved to https://phabricator.wikimedia.org/P77378 and previous config saved to /var/cache/conftool/dbconfig/20250610-045809-marostegui.json
- 04:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1159.eqiad.wmnet with reason: Maintenance
- 04:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2213.codfw.wmnet with reason: Maintenance
- 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.2 (duration: 04m 22s)
2025-06-09
- 23:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T396130)', diff saved to https://phabricator.wikimedia.org/P77377 and previous config saved to /var/cache/conftool/dbconfig/20250609-235425-marostegui.json
- 23:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P77376 and previous config saved to /var/cache/conftool/dbconfig/20250609-233918-marostegui.json
- 23:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P77375 and previous config saved to /var/cache/conftool/dbconfig/20250609-232410-marostegui.json
- 23:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T396130)', diff saved to https://phabricator.wikimedia.org/P77373 and previous config saved to /var/cache/conftool/dbconfig/20250609-230903-marostegui.json
- 23:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2228 (T396130)', diff saved to https://phabricator.wikimedia.org/P77372 and previous config saved to /var/cache/conftool/dbconfig/20250609-230518-marostegui.json
- 23:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2228.codfw.wmnet with reason: Maintenance
- 23:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T396130)', diff saved to https://phabricator.wikimedia.org/P77371 and previous config saved to /var/cache/conftool/dbconfig/20250609-230454-marostegui.json
- 22:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P77370 and previous config saved to /var/cache/conftool/dbconfig/20250609-224947-marostegui.json
- 22:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 22:36 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cirrussearch2110.codfw.wmnet
- 22:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P77369 and previous config saved to /var/cache/conftool/dbconfig/20250609-223439-marostegui.json
- 22:29 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2110.codfw.wmnet
- 22:19 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2111.codfw.wmnet
- 22:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T396130)', diff saved to https://phabricator.wikimedia.org/P77368 and previous config saved to /var/cache/conftool/dbconfig/20250609-221932-marostegui.json
- 22:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2223 (T396130)', diff saved to https://phabricator.wikimedia.org/P77367 and previous config saved to /var/cache/conftool/dbconfig/20250609-221524-marostegui.json
- 22:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2223.codfw.wmnet with reason: Maintenance
- 22:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T396130)', diff saved to https://phabricator.wikimedia.org/P77366 and previous config saved to /var/cache/conftool/dbconfig/20250609-221501-marostegui.json
- 22:12 maryum: Deployed security fix for T395063
- 22:12 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2111.codfw.wmnet
- 22:08 maryum: Deployed security fix for T396230
- 22:04 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2112.codfw.wmnet
- 22:01 maryum: Deployed security fix for T395730
- 21:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P77365 and previous config saved to /var/cache/conftool/dbconfig/20250609-215953-marostegui.json
- 21:56 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2112.codfw.wmnet
- 21:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2114.codfw.wmnet
- 21:49 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:49 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:49 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2114.codfw.wmnet
- 21:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2115.codfw.wmnet
- 21:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P77364 and previous config saved to /var/cache/conftool/dbconfig/20250609-214446-marostegui.json
- 21:41 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2115.codfw.wmnet
- 21:36 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cirrussearch2115.codfw.wmnet
- 21:36 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2115.codfw.wmnet
- 21:35 ladsgroup@deploy1003: Finished scap sync-world: Backport for Restrict event page decoration to currently allowed namespaces (T392784) (duration: 11m 07s)
- 21:33 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1186
- 21:33 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
- 21:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T396130)', diff saved to https://phabricator.wikimedia.org/P77363 and previous config saved to /var/cache/conftool/dbconfig/20250609-212939-marostegui.json
- 21:29 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:29 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:28 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 21:28 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 21:27 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1186
- 21:27 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:27 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
- 21:26 ladsgroup@deploy1003: ladsgroup: Backport for Restrict event page decoration to currently allowed namespaces (T392784) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T396130)', diff saved to https://phabricator.wikimedia.org/P77362 and previous config saved to /var/cache/conftool/dbconfig/20250609-212531-marostegui.json
- 21:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2211.codfw.wmnet with reason: Maintenance
- 21:24 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 21:24 ladsgroup@deploy1003: Started scap sync-world: Backport for Restrict event page decoration to currently allowed namespaces (T392784)
- 21:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance
- 21:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T396130)', diff saved to https://phabricator.wikimedia.org/P77361 and previous config saved to /var/cache/conftool/dbconfig/20250609-212253-marostegui.json
- 21:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2114.codfw.wmnet
- 21:19 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cirrussearch2113.codfw.wmnet
- 21:19 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2113.codfw.wmnet
- 21:18 eileen: config revision changed from 8acfbae4 to 37a2c896
- 21:12 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2114.codfw.wmnet
- 21:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2115.codfw.wmnet
- 21:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P77360 and previous config saved to /var/cache/conftool/dbconfig/20250609-210746-marostegui.json
- 21:03 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:03 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:01 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host cirrussearch2115.codfw.wmnet
- 20:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P77359 and previous config saved to /var/cache/conftool/dbconfig/20250609-205239-marostegui.json
- 20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T396130)', diff saved to https://phabricator.wikimedia.org/P77358 and previous config saved to /var/cache/conftool/dbconfig/20250609-203733-marostegui.json
- 20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T396130)', diff saved to https://phabricator.wikimedia.org/P77357 and previous config saved to /var/cache/conftool/dbconfig/20250609-203448-marostegui.json
- 20:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T396130)', diff saved to https://phabricator.wikimedia.org/P77356 and previous config saved to /var/cache/conftool/dbconfig/20250609-203425-marostegui.json
- 20:31 jsn@deploy1003: Finished scap sync-world: Backport for Deploy remaining Patroller Tools surveys (T396250) (duration: 13m 15s)
- 20:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1252.eqiad.wmnet with reason: Maintenance
- 20:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T395241)', diff saved to https://phabricator.wikimedia.org/P77355 and previous config saved to /var/cache/conftool/dbconfig/20250609-202723-fceratto.json
- 20:24 jsn@deploy1003: jsn: Continuing with sync
- 20:20 jsn@deploy1003: jsn: Backport for Deploy remaining Patroller Tools surveys (T396250) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P77354 and previous config saved to /var/cache/conftool/dbconfig/20250609-201918-marostegui.json
- 20:18 jsn@deploy1003: Started scap sync-world: Backport for Deploy remaining Patroller Tools surveys (T396250)
- 20:14 arlolra@deploy1003: Finished scap sync-world: Backport for Disable VipsScaler in group1 (T290759) (duration: 10m 23s)
- 20:14 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:13 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P77353 and previous config saved to /var/cache/conftool/dbconfig/20250609-201216-fceratto.json
- 20:11 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:09 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:07 arlolra@deploy1003: arlolra: Continuing with sync
- 20:05 arlolra@deploy1003: arlolra: Backport for Disable VipsScaler in group1 (T290759) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P77352 and previous config saved to /var/cache/conftool/dbconfig/20250609-200411-marostegui.json
- 20:03 arlolra@deploy1003: Started scap sync-world: Backport for Disable VipsScaler in group1 (T290759)
- 20:02 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 20:01 bd808@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 19:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P77351 and previous config saved to /var/cache/conftool/dbconfig/20250609-195709-fceratto.json
- 19:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T396130)', diff saved to https://phabricator.wikimedia.org/P77350 and previous config saved to /var/cache/conftool/dbconfig/20250609-194904-marostegui.json
- 19:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T396130)', diff saved to https://phabricator.wikimedia.org/P77349 and previous config saved to /var/cache/conftool/dbconfig/20250609-194520-marostegui.json
- 19:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T396130)', diff saved to https://phabricator.wikimedia.org/P77348 and previous config saved to /var/cache/conftool/dbconfig/20250609-194456-marostegui.json
- 19:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T395241)', diff saved to https://phabricator.wikimedia.org/P77347 and previous config saved to /var/cache/conftool/dbconfig/20250609-194203-fceratto.json
- 19:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T395241)', diff saved to https://phabricator.wikimedia.org/P77346 and previous config saved to /var/cache/conftool/dbconfig/20250609-193354-fceratto.json
- 19:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 19:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T395241)', diff saved to https://phabricator.wikimedia.org/P77345 and previous config saved to /var/cache/conftool/dbconfig/20250609-193329-fceratto.json
- 19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P77344 and previous config saved to /var/cache/conftool/dbconfig/20250609-192949-marostegui.json
- 19:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P77343 and previous config saved to /var/cache/conftool/dbconfig/20250609-191823-fceratto.json
- 19:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P77342 and previous config saved to /var/cache/conftool/dbconfig/20250609-191442-marostegui.json
- 19:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P77341 and previous config saved to /var/cache/conftool/dbconfig/20250609-190316-fceratto.json
- 18:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T396130)', diff saved to https://phabricator.wikimedia.org/P77340 and previous config saved to /var/cache/conftool/dbconfig/20250609-185935-marostegui.json
- 18:59 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol2004-dev.codfw.wmnet
- 18:59 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:59 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T396130)', diff saved to https://phabricator.wikimedia.org/P77339 and previous config saved to /var/cache/conftool/dbconfig/20250609-185525-marostegui.json
- 18:55 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77338 and previous config saved to /var/cache/conftool/dbconfig/20250609-185502-marostegui.json
- 18:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T395241)', diff saved to https://phabricator.wikimedia.org/P77337 and previous config saved to /var/cache/conftool/dbconfig/20250609-184809-fceratto.json
- 18:47 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 18:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P77336 and previous config saved to /var/cache/conftool/dbconfig/20250609-183955-marostegui.json
- 18:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T395241)', diff saved to https://phabricator.wikimedia.org/P77335 and previous config saved to /var/cache/conftool/dbconfig/20250609-183915-fceratto.json
- 18:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 18:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T395241)', diff saved to https://phabricator.wikimedia.org/P77334 and previous config saved to /var/cache/conftool/dbconfig/20250609-183850-fceratto.json
- 18:37 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 18:31 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcontrol2004-dev.codfw.wmnet
- 18:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P77333 and previous config saved to /var/cache/conftool/dbconfig/20250609-182448-marostegui.json
- 18:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P77332 and previous config saved to /var/cache/conftool/dbconfig/20250609-182343-fceratto.json
- 18:22 hmonroy@deploy1003: Finished scap sync-world: Backport for Enable Codex and Multiblocks by default (T377121) (duration: 16m 57s)
- 18:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:15 hmonroy@deploy1003: hmonroy: Continuing with sync
- 18:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77331 and previous config saved to /var/cache/conftool/dbconfig/20250609-180941-marostegui.json
- 18:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:09 hmonroy@deploy1003: hmonroy: Backport for Enable Codex and Multiblocks by default (T377121) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P77330 and previous config saved to /var/cache/conftool/dbconfig/20250609-180836-fceratto.json
- 18:05 hmonroy@deploy1003: Started scap sync-world: Backport for Enable Codex and Multiblocks by default (T377121)
- 18:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T396130)', diff saved to https://phabricator.wikimedia.org/P77329 and previous config saved to /var/cache/conftool/dbconfig/20250609-180530-marostegui.json
- 18:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 18:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 18:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2044
- 17:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2044
- 17:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 17:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T396130)', diff saved to https://phabricator.wikimedia.org/P77328 and previous config saved to /var/cache/conftool/dbconfig/20250609-175747-marostegui.json
- 17:55 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T395241)', diff saved to https://phabricator.wikimedia.org/P77327 and previous config saved to /var/cache/conftool/dbconfig/20250609-175330-fceratto.json
- 17:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 17:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2044 to codfw - jhancock@cumin2002"
- 17:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2044 to codfw - jhancock@cumin2002"
- 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Maintenance
- 17:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 17:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 17:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T395241)', diff saved to https://phabricator.wikimedia.org/P77326 and previous config saved to /var/cache/conftool/dbconfig/20250609-174523-fceratto.json
- 17:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 17:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T395241)', diff saved to https://phabricator.wikimedia.org/P77325 and previous config saved to /var/cache/conftool/dbconfig/20250609-174457-fceratto.json
- 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P77324 and previous config saved to /var/cache/conftool/dbconfig/20250609-174240-marostegui.json
- 17:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P77323 and previous config saved to /var/cache/conftool/dbconfig/20250609-172950-fceratto.json
- 17:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P77322 and previous config saved to /var/cache/conftool/dbconfig/20250609-172733-marostegui.json
- 17:21 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 17:21 inflatador: bking@cumin1003 power down cirrussearch1063 to prevent logspam T394350
- 17:21 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 17:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P77320 and previous config saved to /var/cache/conftool/dbconfig/20250609-171443-fceratto.json
- 17:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T396130)', diff saved to https://phabricator.wikimedia.org/P77319 and previous config saved to /var/cache/conftool/dbconfig/20250609-171225-marostegui.json
- 17:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:10 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 17:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T396130)', diff saved to https://phabricator.wikimedia.org/P77318 and previous config saved to /var/cache/conftool/dbconfig/20250609-170939-marostegui.json
- 17:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 17:09 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 17:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 17:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T396130)', diff saved to https://phabricator.wikimedia.org/P77317 and previous config saved to /var/cache/conftool/dbconfig/20250609-170447-marostegui.json
- 17:04 swfrench@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
- 17:04 swfrench@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
- 17:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2043
- 16:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2043
- 16:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2043 to codfw - jhancock@cumin2002"
- 16:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T395241)', diff saved to https://phabricator.wikimedia.org/P77316 and previous config saved to /var/cache/conftool/dbconfig/20250609-165936-fceratto.json
- 16:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2043 to codfw - jhancock@cumin2002"
- 16:56 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T395241)', diff saved to https://phabricator.wikimedia.org/P77315 and previous config saved to /var/cache/conftool/dbconfig/20250609-165125-fceratto.json
- 16:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 16:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T395241)', diff saved to https://phabricator.wikimedia.org/P77314 and previous config saved to /var/cache/conftool/dbconfig/20250609-165100-fceratto.json
- 16:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P77313 and previous config saved to /var/cache/conftool/dbconfig/20250609-164940-marostegui.json
- 16:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cp2058.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp2058
- 16:38 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp2058
- 16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2058 to codfw - jhancock@cumin2002"
- 16:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cp2058 to codfw - jhancock@cumin2002"
- 16:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P77312 and previous config saved to /var/cache/conftool/dbconfig/20250609-163553-fceratto.json
- 16:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P77311 and previous config saved to /var/cache/conftool/dbconfig/20250609-163433-marostegui.json
- 16:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:30 dancy@deploy1003: Finished scap sync-world: Testing T395514 (duration: 34m 14s)
- 16:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P77310 and previous config saved to /var/cache/conftool/dbconfig/20250609-162046-fceratto.json
- 16:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T396130)', diff saved to https://phabricator.wikimedia.org/P77309 and previous config saved to /var/cache/conftool/dbconfig/20250609-161926-marostegui.json
- 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T396130)', diff saved to https://phabricator.wikimedia.org/P77308 and previous config saved to /var/cache/conftool/dbconfig/20250609-161640-marostegui.json
- 16:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T396130)', diff saved to https://phabricator.wikimedia.org/P77307 and previous config saved to /var/cache/conftool/dbconfig/20250609-161618-marostegui.json
- 16:12 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 16:12 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 16:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T395241)', diff saved to https://phabricator.wikimedia.org/P77306 and previous config saved to /var/cache/conftool/dbconfig/20250609-160539-fceratto.json
- 16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P77305 and previous config saved to /var/cache/conftool/dbconfig/20250609-160111-marostegui.json
- 15:57 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T395241)', diff saved to https://phabricator.wikimedia.org/P77304 and previous config saved to /var/cache/conftool/dbconfig/20250609-155730-fceratto.json
- 15:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 15:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T395241)', diff saved to https://phabricator.wikimedia.org/P77303 and previous config saved to /var/cache/conftool/dbconfig/20250609-155705-fceratto.json
- 15:55 dancy@deploy1003: Started scap sync-world: Testing T395514
- 15:52 dancy@deploy1003: Installation of scap version "4.172.0" completed for 182 hosts
- 15:46 dancy@deploy1003: Installing scap version "4.172.0" for 182 host(s)
- 15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P77302 and previous config saved to /var/cache/conftool/dbconfig/20250609-154604-marostegui.json
- 15:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P77301 and previous config saved to /var/cache/conftool/dbconfig/20250609-154158-fceratto.json
- 15:33 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 15:33 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 15:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T396130)', diff saved to https://phabricator.wikimedia.org/P77300 and previous config saved to /var/cache/conftool/dbconfig/20250609-153057-marostegui.json
- 15:28 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 15:28 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 15:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T396130)', diff saved to https://phabricator.wikimedia.org/P77299 and previous config saved to /var/cache/conftool/dbconfig/20250609-152810-marostegui.json
- 15:28 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 15:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77298 and previous config saved to /var/cache/conftool/dbconfig/20250609-152749-marostegui.json
- 15:27 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 15:27 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
- 15:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P77297 and previous config saved to /var/cache/conftool/dbconfig/20250609-152651-fceratto.json
- 15:26 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
- 15:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 15:25 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 15:25 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 15:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 15:25 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 15:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 15:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 15:23 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 15:23 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:22 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:21 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:16 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 15:16 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 15:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P77296 and previous config saved to /var/cache/conftool/dbconfig/20250609-151242-marostegui.json
- 15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T395241)', diff saved to https://phabricator.wikimedia.org/P77295 and previous config saved to /var/cache/conftool/dbconfig/20250609-151144-fceratto.json
- 15:01 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T395241)', diff saved to https://phabricator.wikimedia.org/P77294 and previous config saved to /var/cache/conftool/dbconfig/20250609-150134-fceratto.json
- 15:01 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 15:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T395241)', diff saved to https://phabricator.wikimedia.org/P77293 and previous config saved to /var/cache/conftool/dbconfig/20250609-150108-fceratto.json
- 14:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P77292 and previous config saved to /var/cache/conftool/dbconfig/20250609-145735-marostegui.json
- 14:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P77291 and previous config saved to /var/cache/conftool/dbconfig/20250609-144601-fceratto.json
- 14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77290 and previous config saved to /var/cache/conftool/dbconfig/20250609-144230-marostegui.json
- 14:40 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts prometheus7001.magru.wmnet
- 14:40 tappof@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:40 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - tappof@cumin1002"
- 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77289 and previous config saved to /var/cache/conftool/dbconfig/20250609-143938-marostegui.json
- 14:39 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 14:39 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: prometheus7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - tappof@cumin1002"
- 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T396130)', diff saved to https://phabricator.wikimedia.org/P77288 and previous config saved to /var/cache/conftool/dbconfig/20250609-143917-marostegui.json
- 14:36 tappof@cumin1002: START - Cookbook sre.dns.netbox
- 14:31 tappof@cumin1002: START - Cookbook sre.hosts.decommission for hosts prometheus7001.magru.wmnet
- 14:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P77287 and previous config saved to /var/cache/conftool/dbconfig/20250609-143054-fceratto.json
- 14:30 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet
- 14:24 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 14:24 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 14:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P77286 and previous config saved to /var/cache/conftool/dbconfig/20250609-142410-marostegui.json
- 14:18 godog: rollout cgroup memory limit + gomemlimit for thanos-sidecar - T394318
- 14:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T395241)', diff saved to https://phabricator.wikimedia.org/P77285 and previous config saved to /var/cache/conftool/dbconfig/20250609-141548-fceratto.json
- 14:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P77284 and previous config saved to /var/cache/conftool/dbconfig/20250609-140903-marostegui.json
- 14:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T395241)', diff saved to https://phabricator.wikimedia.org/P77283 and previous config saved to /var/cache/conftool/dbconfig/20250609-140722-fceratto.json
- 14:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 14:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T395241)', diff saved to https://phabricator.wikimedia.org/P77282 and previous config saved to /var/cache/conftool/dbconfig/20250609-140656-fceratto.json
- 13:55 sukhe@dns1004: END - running authdns-update
- 13:55 sukhe@dns1004: START - running authdns-update
- 13:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T396130)', diff saved to https://phabricator.wikimedia.org/P77281 and previous config saved to /var/cache/conftool/dbconfig/20250609-135355-marostegui.json
- 13:52 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1015.eqiad.wmnet with reason: Upgrading clouddbs T394372
- 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P77280 and previous config saved to /var/cache/conftool/dbconfig/20250609-135150-fceratto.json
- 13:51 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet
- 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T396130)', diff saved to https://phabricator.wikimedia.org/P77279 and previous config saved to /var/cache/conftool/dbconfig/20250609-135105-marostegui.json
- 13:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 13:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77278 and previous config saved to /var/cache/conftool/dbconfig/20250609-135043-marostegui.json
- 13:45 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns5004*} and (A:dnsbox)
- 13:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5004.wikimedia.org
- 13:45 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns6002*} and (A:dnsbox)
- 13:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6002.wikimedia.org
- 13:42 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 13:42 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 13:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P77277 and previous config saved to /var/cache/conftool/dbconfig/20250609-133643-fceratto.json
- 13:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6002.wikimedia.org
- 13:35 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns6002*} and (A:dnsbox)
- 13:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5004.wikimedia.org
- 13:35 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns5004*} and (A:dnsbox)
- 13:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P77276 and previous config saved to /var/cache/conftool/dbconfig/20250609-133535-marostegui.json
- 13:35 sukhe@dns1004: END - running authdns-update
- 13:34 sukhe@dns1004: START - running authdns-update
- 13:34 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns3004*} and (A:dnsbox)
- 13:34 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3004.wikimedia.org
- 13:33 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns4004*} and (A:dnsbox)
- 13:33 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4004.wikimedia.org
- 13:31 taavi@deploy1003: Finished scap sync-world: Backport for logging: Allow sampling of Logstash logs (T395967), logging: Sample some high-volume log streams (T394402) (duration: 24m 30s)
- 13:30 vgutierrez@dns1004: END - running authdns-update
- 13:30 vgutierrez@dns1004: START - running authdns-update
- 13:22 taavi@deploy1003: taavi, tgr: Continuing with sync
- 13:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T395241)', diff saved to https://phabricator.wikimedia.org/P77275 and previous config saved to /var/cache/conftool/dbconfig/20250609-132136-fceratto.json
- 13:21 taavi@deploy1003: taavi, tgr: Backport for logging: Allow sampling of Logstash logs (T395967), logging: Sample some high-volume log streams (T394402) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P77274 and previous config saved to /var/cache/conftool/dbconfig/20250609-132028-marostegui.json
- 13:19 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4004.wikimedia.org
- 13:19 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns4004*} and (A:dnsbox)
- 13:19 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3004.wikimedia.org
- 13:19 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns3004*} and (A:dnsbox)
- 13:17 taavi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:17 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add AAAA record for openstack.codfw1dev.wikimediacloud.org - taavi@cumin1002"
- 13:17 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add AAAA record for openstack.codfw1dev.wikimediacloud.org - taavi@cumin1002"
- 13:17 taavi@dns1004: END - running authdns-update
- 13:16 taavi@dns1004: START - running authdns-update
- 13:13 sukhe@dns1004: FAIL - running authdns-update
- 13:12 sukhe@dns1004: START - running authdns-update
- 13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T395241)', diff saved to https://phabricator.wikimedia.org/P77273 and previous config saved to /var/cache/conftool/dbconfig/20250609-131238-fceratto.json
- 13:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 13:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T395241)', diff saved to https://phabricator.wikimedia.org/P77272 and previous config saved to /var/cache/conftool/dbconfig/20250609-131206-fceratto.json
- 13:11 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns1004*} and (A:dnsbox)
- 13:11 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1004.wikimedia.org
- 13:11 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2005*} and (A:dnsbox)
- 13:11 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2005.wikimedia.org
- 13:10 taavi@cumin1002: START - Cookbook sre.dns.netbox
- 13:07 taavi@deploy1003: Started scap sync-world: Backport for logging: Allow sampling of Logstash logs (T395967), logging: Sample some high-volume log streams (T394402)
- 13:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77271 and previous config saved to /var/cache/conftool/dbconfig/20250609-130521-marostegui.json
- 13:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T396130)', diff saved to https://phabricator.wikimedia.org/P77270 and previous config saved to /var/cache/conftool/dbconfig/20250609-130230-marostegui.json
- 13:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 13:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 12:58 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2005.wikimedia.org
- 12:58 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2005*} and (A:dnsbox)
- 12:58 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1004.wikimedia.org
- 12:58 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns1004*} and (A:dnsbox)
- 12:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P77269 and previous config saved to /var/cache/conftool/dbconfig/20250609-125659-fceratto.json
- 12:55 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 12:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2229.codfw.wmnet with reason: Maintenance
- 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T396130)', diff saved to https://phabricator.wikimedia.org/P77268 and previous config saved to /var/cache/conftool/dbconfig/20250609-124534-marostegui.json
- 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P77267 and previous config saved to /var/cache/conftool/dbconfig/20250609-124152-fceratto.json
- 12:41 jgleeson: SmashPig upgraded from 3222a1f3 to 042d5a5b
- 12:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P77266 and previous config saved to /var/cache/conftool/dbconfig/20250609-123027-marostegui.json
- 12:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T395241)', diff saved to https://phabricator.wikimedia.org/P77265 and previous config saved to /var/cache/conftool/dbconfig/20250609-122644-fceratto.json
- 12:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 12:23 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:23 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new private IP for cloudcontrol2010-dev - andrew@cumin1002"
- 12:23 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new private IP for cloudcontrol2010-dev - andrew@cumin1002"
- 12:20 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 12:19 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 12:17 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:17 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new private IP for cloudcontrol2010-dev - andrew@cumin1002"
- 12:17 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new private IP for cloudcontrol2010-dev - andrew@cumin1002"
- 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T395241)', diff saved to https://phabricator.wikimedia.org/P77264 and previous config saved to /var/cache/conftool/dbconfig/20250609-121700-fceratto.json
- 12:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 12:16 godog: bounce thanos-store on titan1*
- 12:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T395241)', diff saved to https://phabricator.wikimedia.org/P77263 and previous config saved to /var/cache/conftool/dbconfig/20250609-121636-fceratto.json
- 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P77262 and previous config saved to /var/cache/conftool/dbconfig/20250609-121520-marostegui.json
- 12:13 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 12:13 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 12:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 12:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P77261 and previous config saved to /var/cache/conftool/dbconfig/20250609-120129-fceratto.json
- 12:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T396130)', diff saved to https://phabricator.wikimedia.org/P77260 and previous config saved to /var/cache/conftool/dbconfig/20250609-120013-marostegui.json
- 11:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T396130)', diff saved to https://phabricator.wikimedia.org/P77258 and previous config saved to /var/cache/conftool/dbconfig/20250609-115350-marostegui.json
- 11:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T396130)', diff saved to https://phabricator.wikimedia.org/P77257 and previous config saved to /var/cache/conftool/dbconfig/20250609-115328-marostegui.json
- 11:53 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P77256 and previous config saved to /var/cache/conftool/dbconfig/20250609-114622-fceratto.json
- 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P77255 and previous config saved to /var/cache/conftool/dbconfig/20250609-113821-marostegui.json
- 11:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T395241)', diff saved to https://phabricator.wikimedia.org/P77254 and previous config saved to /var/cache/conftool/dbconfig/20250609-113113-fceratto.json
- 11:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P77253 and previous config saved to /var/cache/conftool/dbconfig/20250609-112314-marostegui.json
- 11:19 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T395241)', diff saved to https://phabricator.wikimedia.org/P77252 and previous config saved to /var/cache/conftool/dbconfig/20250609-111951-fceratto.json
- 11:19 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2151* gradually with 4 steps - Pooling in
- 11:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 11:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T395241)', diff saved to https://phabricator.wikimedia.org/P77250 and previous config saved to /var/cache/conftool/dbconfig/20250609-111926-fceratto.json
- 11:18 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2158* gradually with 4 steps - Pooling in
- 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T396130)', diff saved to https://phabricator.wikimedia.org/P77247 and previous config saved to /var/cache/conftool/dbconfig/20250609-110807-marostegui.json
- 11:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P77245 and previous config saved to /var/cache/conftool/dbconfig/20250609-110418-fceratto.json
- 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T396130)', diff saved to https://phabricator.wikimedia.org/P77243 and previous config saved to /var/cache/conftool/dbconfig/20250609-110140-marostegui.json
- 11:01 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T396130)', diff saved to https://phabricator.wikimedia.org/P77242 and previous config saved to /var/cache/conftool/dbconfig/20250609-110118-marostegui.json
- 10:54 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:54 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P77240 and previous config saved to /var/cache/conftool/dbconfig/20250609-104911-fceratto.json
- 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P77238 and previous config saved to /var/cache/conftool/dbconfig/20250609-104611-marostegui.json
- 10:34 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2151* gradually with 4 steps - Pooling in
- 10:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T395241)', diff saved to https://phabricator.wikimedia.org/P77236 and previous config saved to /var/cache/conftool/dbconfig/20250609-103404-fceratto.json
- 10:33 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:33 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:33 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2158* gradually with 4 steps - Pooling in
- 10:31 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2151.codfw.wmnet
- 10:31 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for db2151.codfw.wmnet
- 10:31 fceratto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db2158.codfw.wmnet
- 10:31 fceratto@cumin1002: START - Cookbook sre.hosts.remove-downtime for db2158.codfw.wmnet
- 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P77234 and previous config saved to /var/cache/conftool/dbconfig/20250609-103104-marostegui.json
- 10:30 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 10:29 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 10:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T395241)', diff saved to https://phabricator.wikimedia.org/P77233 and previous config saved to /var/cache/conftool/dbconfig/20250609-102214-fceratto.json
- 10:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 10:16 vgutierrez: repooling lvs1013 handling ncredir@eqiad using katran based load balancing - T395228
- 10:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T396130)', diff saved to https://phabricator.wikimedia.org/P77232 and previous config saved to /var/cache/conftool/dbconfig/20250609-101557-marostegui.json
- 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2214', diff saved to https://phabricator.wikimedia.org/P77231 and previous config saved to /var/cache/conftool/dbconfig/20250609-101400-marostegui.json
- 10:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77230 and previous config saved to /var/cache/conftool/dbconfig/20250609-101258-root.json
- 10:12 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2158.codfw.wmnet onto db2151.codfw.wmnet
- 10:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 10:06 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2214 T395989', diff saved to https://phabricator.wikimedia.org/P77229 and previous config saved to /var/cache/conftool/dbconfig/20250609-100605-marostegui.json
- 10:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 10:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T396130)', diff saved to https://phabricator.wikimedia.org/P77228 and previous config saved to /var/cache/conftool/dbconfig/20250609-100337-marostegui.json
- 09:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P77227 and previous config saved to /var/cache/conftool/dbconfig/20250609-094830-marostegui.json
- 09:42 marostegui: Migrate s2 eqiad dbmaint to SBR T383795
- 09:35 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 09:34 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 09:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P77226 and previous config saved to /var/cache/conftool/dbconfig/20250609-093323-marostegui.json
- 09:31 vgutierrez: depooling lvs1013 before switching ncredir@eqiad to katran based load balancing - T395228
- 09:28 tappof@dns1004: END - running authdns-update
- 09:27 tappof@dns1004: START - running authdns-update
- 09:20 vgutierrez: upload liberica 0.17 to apt.wm.o (bookworm-wikimedia) - T395228
- 09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T396130)', diff saved to https://phabricator.wikimedia.org/P77225 and previous config saved to /var/cache/conftool/dbconfig/20250609-091816-marostegui.json
- 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T396130)', diff saved to https://phabricator.wikimedia.org/P77224 and previous config saved to /var/cache/conftool/dbconfig/20250609-091528-marostegui.json
- 09:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77223 and previous config saved to /var/cache/conftool/dbconfig/20250609-091506-marostegui.json
- 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P77222 and previous config saved to /var/cache/conftool/dbconfig/20250609-085959-marostegui.json
- 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P77221 and previous config saved to /var/cache/conftool/dbconfig/20250609-084452-marostegui.json
- 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77220 and previous config saved to /var/cache/conftool/dbconfig/20250609-082945-marostegui.json
- 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T396130)', diff saved to https://phabricator.wikimedia.org/P77219 and previous config saved to /var/cache/conftool/dbconfig/20250609-082655-marostegui.json
- 08:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T396130)', diff saved to https://phabricator.wikimedia.org/P77218 and previous config saved to /var/cache/conftool/dbconfig/20250609-082633-marostegui.json
- 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P77217 and previous config saved to /var/cache/conftool/dbconfig/20250609-081126-marostegui.json
- 08:08 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2243.codfw.wmnet onto db2244.codfw.wmnet
- 08:08 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2244 gradually with 4 steps - Pool db2244.codfw.wmnet in after cloning
- 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P77215 and previous config saved to /var/cache/conftool/dbconfig/20250609-075619-marostegui.json
- 07:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T396130)', diff saved to https://phabricator.wikimedia.org/P77213 and previous config saved to /var/cache/conftool/dbconfig/20250609-074112-marostegui.json
- 07:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T396130)', diff saved to https://phabricator.wikimedia.org/P77211 and previous config saved to /var/cache/conftool/dbconfig/20250609-073403-marostegui.json
- 07:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 07:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 07:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 07:23 marostegui@cumin1002: START - Cookbook sre.mysql.pool db2244 gradually with 4 steps - Pool db2244.codfw.wmnet in after cloning
- 06:22 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2243 gradually with 4 steps - Pool db2243.codfw.wmnet in after cloning
- 05:42 marostegui: Add MariaDB 10.11.13 to the repo T395663
- 05:37 marostegui@cumin1002: START - Cookbook sre.mysql.pool db2243 gradually with 4 steps - Pool db2243.codfw.wmnet in after cloning
- 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Add db2244 to dbctl depooled T393989', diff saved to https://phabricator.wikimedia.org/P77205 and previous config saved to /var/cache/conftool/dbconfig/20250609-052451-marostegui.json
- 05:00 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2243 - Depool db2243.codfw.wmnet to then clone it to db2244.codfw.wmnet - marostegui@cumin1002
- 05:00 marostegui@cumin1002: START - Cookbook sre.mysql.depool db2243 - Depool db2243.codfw.wmnet to then clone it to db2244.codfw.wmnet - marostegui@cumin1002
- 05:00 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2243.codfw.wmnet onto db2244.codfw.wmnet
2025-06-08
- 12:04 Ammar: Ran fixStuckGlobalRename.php for T396290 and T396291
- 09:53 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)
2025-06-07
- 11:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2007.codfw.wmnet with OS bullseye
- 11:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2006.codfw.wmnet with OS bullseye
- 08:12 elukey: restart apache2 / php-fpm on phab1004
- 04:18 mutante: restarted apache on phab1004
2025-06-06
- 21:33 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:25 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:19 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:15 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:02 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: downtime before decom
- 20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2007.codfw.wmnet with reason: host reimage
- 20:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2007.codfw.wmnet with reason: host reimage
- 20:35 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2007.codfw.wmnet with OS bullseye
- 20:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2006.codfw.wmnet with reason: host reimage
- 20:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2006.codfw.wmnet with reason: host reimage
- 19:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 19:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2006.codfw.wmnet with OS bullseye
- 19:11 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
- 19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2244.codfw.wmnet with OS bookworm
- 19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2244.codfw.wmnet with reason: host reimage
- 17:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2244.codfw.wmnet with reason: host reimage
- 17:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2244.codfw.wmnet with OS bookworm
- 17:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2244']
- 17:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2244']
- 17:20 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
- 17:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:08 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
- 17:06 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: T383811 - bking@cumin2002
- 17:00 sukhe: forced agent run on O:alerting_host to reload vopsbot to add cdobbins
- 16:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2244
- 16:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2244
- 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:42 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:08 sbassett: Deployed security update to fix T396111
- 15:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
- 15:34 eevans@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
- 15:24 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 15:23 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 15:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1013.eqiad.wmnet
- 15:19 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1013.eqiad.wmnet
- 14:53 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:42 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:24 sukhe@dns1004: END - running authdns-update
- 14:23 sukhe@dns1004: START - running authdns-update
- 14:23 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2004*} and (A:dnsbox)
- 14:23 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2004.wikimedia.org
- 14:22 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns1005*} and (A:dnsbox)
- 14:22 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1005.wikimedia.org
- 14:10 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1005.wikimedia.org
- 14:10 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns1005*} and (A:dnsbox)
- 14:10 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2004.wikimedia.org
- 14:10 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2004*} and (A:dnsbox)
- 13:51 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns2006*} and (A:dnsbox)
- 13:51 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns2006.wikimedia.org
- 13:49 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns1006*} and (A:dnsbox)
- 13:49 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns1006.wikimedia.org
- 13:40 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 13:40 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 13:40 vgutierrez@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: switching to katran
- 13:35 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns2006.wikimedia.org
- 13:35 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns2006*} and (A:dnsbox)
- 13:34 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns1006.wikimedia.org
- 13:34 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns1006*} and (A:dnsbox)
- 13:32 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns3003*} and (A:dnsbox)
- 13:32 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns3003.wikimedia.org
- 13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns6001*} and (A:dnsbox)
- 13:31 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns6001.wikimedia.org
- 13:21 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:21 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add back entry for mistakenly deleted ssw1-a8-codfw IP - cmooney@cumin1003"
- 13:21 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add back entry for mistakenly deleted ssw1-a8-codfw IP - cmooney@cumin1003"
- 13:18 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns6001.wikimedia.org
- 13:18 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns6001*} and (A:dnsbox)
- 13:18 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns3003.wikimedia.org
- 13:18 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns3003*} and (A:dnsbox)
- 13:13 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns5003*} and (A:dnsbox)
- 13:13 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns5003.wikimedia.org
- 13:09 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on P{dns4003*} and (A:dnsbox)
- 13:09 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns4003.wikimedia.org
- 13:06 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 12:58 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns5003.wikimedia.org
- 12:58 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns5003*} and (A:dnsbox)
- 12:58 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns4003.wikimedia.org
- 12:58 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on P{dns4003*} and (A:dnsbox)
- 12:20 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2158 - Depool db2158.codfw.wmnet to then clone it to db2151.codfw.wmnet - fceratto@cumin1002
- 12:20 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2158 - Depool db2158.codfw.wmnet to then clone it to db2151.codfw.wmnet - fceratto@cumin1002
- 12:19 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2158.codfw.wmnet onto db2151.codfw.wmnet
- 12:03 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:03 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 11:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2151.codfw.wmnet with reason: Disabling notifications
- 11:42 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 11:42 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 11:32 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:32 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:32 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:31 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:31 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 11:30 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 11:30 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 11:28 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 11:28 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:28 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:28 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:27 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:27 jayme@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 11:26 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 11:25 jayme@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 11:22 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:21 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 11:18 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:18 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 11:17 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:17 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:12 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:12 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 09:40 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2151* - Log issue and disk filled up
- 09:40 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2151* - Log issue and disk filled up
- 09:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 09:04 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 08:37 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398044
- 08:36 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 398044
- 08:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46997
- 08:34 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 46997
- 08:33 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 394065
- 08:33 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 394065
- 08:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46562
- 08:32 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 46562
- 08:30 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 13150
- 08:30 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 13150
- 08:28 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
- 08:25 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 199524
- 08:24 volans@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cirrussearch2113.codfw.wmnet
- 08:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63199
- 08:23 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2113.codfw.wmnet
- 08:23 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 63199
- 08:22 volans@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts cirrussearch2113.codfw.wmnet
- 08:22 volans@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cirrussearch2113.codfw.wmnet
- 08:11 volans@cumin1003: START - Cookbook sre.hosts.reboot-single for host cirrussearch2113.codfw.wmnet
- 08:10 volans@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cirrussearch2113.codfw.wmnet
- 07:52 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on cirrussearch2113.codfw.wmnet with reason: T394543
- 06:15 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1157-1159].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 5 - rack F1
- 06:14 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1160-1162].eqiad.wmnet with reason: Upgrade an-worker hard drives from 4TB to 8TB group 5 - rack F1
- 06:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b3-magru
- 06:13 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device asw1-b3-magru
- 05:42 XioNoX: push pfw policies - T395904
- 03:02 eevans@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
- 02:57 eevans@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
2025-06-05
- 22:05 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 22:04 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 22:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 22:02 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:58 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:57 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:42 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:29 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 21:00 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 20:59 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 20:57 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 20:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:56 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 20:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:49 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:24 jdlrobson@deploy1003: Finished scap sync-world: Backport for Fix back compat for data-chart (T395462) (duration: 10m 05s)
- 20:17 jdlrobson@deploy1003: jdlrobson: Continuing with sync
- 20:16 jdlrobson@deploy1003: jdlrobson: Backport for Fix back compat for data-chart (T395462) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:14 jdlrobson@deploy1003: Started scap sync-world: Backport for Fix back compat for data-chart (T395462)
- 20:09 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 19:24 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:22 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:15 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:14 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:14 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
- 19:13 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
- 19:13 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:12 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:12 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
- 19:12 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
- 18:52 phuedx: Disabled the SDS 2.4.11 Synthetic A/A Test in xLab
- 18:49 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 18:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:43 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:32 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:30 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:21 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.4 refs T392174
- 18:21 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:20 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
- 18:20 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
- 18:19 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:18 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 18:17 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 18:17 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)
- 18:17 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:17 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:05 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 18:04 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 17:28 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:27 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:25 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:24 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:24 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:24 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:21 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 17:21 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 17:15 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1013.eqiad.wmnet
- 17:15 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1013.eqiad.wmnet
- 16:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief2002.codfw.wmnet,acmechief1002.eqiad.wmnet,acmechief-test2001.codfw.wmnet,acmechief-test1001.eqiad.wmnet
- 16:54 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for acmechief2002.codfw.wmnet,acmechief1002.eqiad.wmnet,acmechief-test2001.codfw.wmnet,acmechief-test1001.eqiad.wmnet
- 16:51 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@930d28b]: adapt check_bad_parsing to dumps 2.0 (duration: 01m 16s)
- 16:50 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@930d28b]: adapt check_bad_parsing to dumps 2.0
- 16:50 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on acmechief2002.codfw.wmnet,acmechief1002.eqiad.wmnet,acmechief-test2001.codfw.wmnet,acmechief-test1001.eqiad.wmnet with reason: Reboots
- 16:27 jdlrobson@deploy1003: Finished scap sync-world: Backport for Revert "Deploy survey to en at twenty percent" (duration: 11m 23s)
- 16:20 jdlrobson@deploy1003: jdlrobson, jdrewniak: Continuing with sync
- 16:18 jdlrobson@deploy1003: jdlrobson, jdrewniak: Backport for Revert "Deploy survey to en at twenty percent" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T395241)', diff saved to https://phabricator.wikimedia.org/P77191 and previous config saved to /var/cache/conftool/dbconfig/20250605-161701-fceratto.json
- 16:16 jdlrobson@deploy1003: Started scap sync-world: Backport for Revert "Deploy survey to en at twenty percent"
- 16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2244.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2244
- 16:02 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host db2244
- 16:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P77190 and previous config saved to /var/cache/conftool/dbconfig/20250605-160154-fceratto.json
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2244 to codfw - jhancock@cumin2002"
- 16:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2244 to codfw - jhancock@cumin2002"
- 15:57 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:55 aokoth@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on doc1003.eqiad.wmnet with reason: Bookworm Migration
- 15:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P77189 and previous config saved to /var/cache/conftool/dbconfig/20250605-154647-fceratto.json
- 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T395241)', diff saved to https://phabricator.wikimedia.org/P77188 and previous config saved to /var/cache/conftool/dbconfig/20250605-153139-fceratto.json
- 15:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T395241)', diff saved to https://phabricator.wikimedia.org/P77187 and previous config saved to /var/cache/conftool/dbconfig/20250605-152314-fceratto.json
- 15:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
- 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T395241)', diff saved to https://phabricator.wikimedia.org/P77186 and previous config saved to /var/cache/conftool/dbconfig/20250605-152248-fceratto.json
- 15:08 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2004-dev.codfw.wmnet
- 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P77185 and previous config saved to /var/cache/conftool/dbconfig/20250605-150741-fceratto.json
- 14:59 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2004-dev.codfw.wmnet
- 14:59 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
- 14:53 damilare: payments-wiki upgraded from 2d8b655a to aa102260
- 14:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P77184 and previous config saved to /var/cache/conftool/dbconfig/20250605-145234-fceratto.json
- 14:50 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
- 14:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T395241)', diff saved to https://phabricator.wikimedia.org/P77183 and previous config saved to /var/cache/conftool/dbconfig/20250605-143724-fceratto.json
- 14:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T395241)', diff saved to https://phabricator.wikimedia.org/P77182 and previous config saved to /var/cache/conftool/dbconfig/20250605-142908-fceratto.json
- 14:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance
- 14:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77181 and previous config saved to /var/cache/conftool/dbconfig/20250605-142840-fceratto.json
- 14:20 taavi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 'private.codfw.wikimedia.cloud$' on codfw recursors
- 14:20 taavi@cumin1002: START - Cookbook sre.dns.wipe-cache 'private.codfw.wikimedia.cloud$' on codfw recursors
- 14:19 tgr@deploy1003: Unlocked for deployment [MediaWiki]: T395468 (duration: 39m 39s)
- 14:19 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: switching to katran
- 14:18 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 14:18 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 14:18 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest2007
- 14:18 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host sretest2007
- 14:17 taavi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:17 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign cloud-private v6 addresses for codfw1dev devices - taavi@cumin1002"
- 14:17 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: assign cloud-private v6 addresses for codfw1dev devices - taavi@cumin1002"
- 14:17 tgr: deploying a PrivateSettings config change
- 14:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P77179 and previous config saved to /var/cache/conftool/dbconfig/20250605-141333-fceratto.json
- 14:13 taavi@cumin1002: START - Cookbook sre.dns.netbox
- 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77178 and previous config saved to /var/cache/conftool/dbconfig/20250605-141108-root.json
- 13:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P77177 and previous config saved to /var/cache/conftool/dbconfig/20250605-135826-fceratto.json
- 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77176 and previous config saved to /var/cache/conftool/dbconfig/20250605-135603-root.json
- 13:51 marostegui: Migrate s2 codfw to SBR dbmaint T383795
- 13:50 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 13:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2037.codfw.wmnet
- 13:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2037.codfw.wmnet
- 13:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:49 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 13:49 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 13:48 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 13:48 vgutierrez: upload liberica 0.16 to bookworm-wikimedia (apt.wm.o) - T395228
- 13:48 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 13:47 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 13:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2037.codfw.wmnet
- 13:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77175 and previous config saved to /var/cache/conftool/dbconfig/20250605-134319-fceratto.json
- 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77174 and previous config saved to /var/cache/conftool/dbconfig/20250605-134158-root.json
- 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77173 and previous config saved to /var/cache/conftool/dbconfig/20250605-134153-root.json
- 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77172 and previous config saved to /var/cache/conftool/dbconfig/20250605-134057-root.json
- 13:40 moritzm: installing net-tools bugfix updates for bookworm
- 13:40 tgr@deploy1003: Locking from deployment [MediaWiki]: T395468
- 13:38 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2037.codfw.wmnet
- 13:37 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2036.codfw.wmnet
- 13:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2036.codfw.wmnet
- 13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T395241)', diff saved to https://phabricator.wikimedia.org/P77171 and previous config saved to /var/cache/conftool/dbconfig/20250605-133500-fceratto.json
- 13:34 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 13:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T395241)', diff saved to https://phabricator.wikimedia.org/P77170 and previous config saved to /var/cache/conftool/dbconfig/20250605-133434-fceratto.json
- 13:31 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2036.codfw.wmnet
- 13:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77169 and previous config saved to /var/cache/conftool/dbconfig/20250605-132652-root.json
- 13:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77168 and previous config saved to /var/cache/conftool/dbconfig/20250605-132648-root.json
- 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77167 and previous config saved to /var/cache/conftool/dbconfig/20250605-132552-root.json
- 13:23 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow7001.magru.wmnet
- 13:23 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:23 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 13:21 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 13:21 Lucas_WMDE: UTC afternoon backport+config window done
- 13:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P77166 and previous config saved to /var/cache/conftool/dbconfig/20250605-131926-fceratto.json
- 13:18 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2036.codfw.wmnet
- 13:17 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 13:16 gkyziridis@deploy1003: Finished scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (excluding azwiki) (T395823) (duration: 11m 51s)
- 13:12 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts netflow7001.magru.wmnet
- 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77165 and previous config saved to /var/cache/conftool/dbconfig/20250605-131147-root.json
- 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77164 and previous config saved to /var/cache/conftool/dbconfig/20250605-131142-root.json
- 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77163 and previous config saved to /var/cache/conftool/dbconfig/20250605-131046-root.json
- 13:09 gkyziridis@deploy1003: gkyziridis: Continuing with sync
- 13:07 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (excluding azwiki) (T395823) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:04 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (excluding azwiki) (T395823)
- 13:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P77162 and previous config saved to /var/cache/conftool/dbconfig/20250605-130419-fceratto.json
- 13:03 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "remove outdated octavia net - taavi@cumin1002"
- 13:03 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "remove outdated octavia net - taavi@cumin1002"
- 12:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77161 and previous config saved to /var/cache/conftool/dbconfig/20250605-125641-root.json
- 12:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77160 and previous config saved to /var/cache/conftool/dbconfig/20250605-125637-root.json
- 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2151 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77159 and previous config saved to /var/cache/conftool/dbconfig/20250605-125540-root.json
- 12:54 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2035.codfw.wmnet
- 12:54 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2035.codfw.wmnet
- 12:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2151 T395989', diff saved to https://phabricator.wikimedia.org/P77158 and previous config saved to /var/cache/conftool/dbconfig/20250605-125057-marostegui.json
- 12:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T395241)', diff saved to https://phabricator.wikimedia.org/P77157 and previous config saved to /var/cache/conftool/dbconfig/20250605-124912-fceratto.json
- 12:48 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2035.codfw.wmnet
- 12:43 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2035.codfw.wmnet
- 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77156 and previous config saved to /var/cache/conftool/dbconfig/20250605-124136-root.json
- 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77155 and previous config saved to /var/cache/conftool/dbconfig/20250605-124131-root.json
- 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T395241)', diff saved to https://phabricator.wikimedia.org/P77154 and previous config saved to /var/cache/conftool/dbconfig/20250605-124110-fceratto.json
- 12:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 12:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77153 and previous config saved to /var/cache/conftool/dbconfig/20250605-124043-fceratto.json
- 12:32 jakob@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:32 jakob@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 12:31 jakob@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:30 jakob@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 12:30 jakob@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 12:30 jakob@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2042 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77151 and previous config saved to /var/cache/conftool/dbconfig/20250605-122631-root.json
- 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2045 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77150 and previous config saved to /var/cache/conftool/dbconfig/20250605-122625-root.json
- 12:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P77149 and previous config saved to /var/cache/conftool/dbconfig/20250605-122537-fceratto.json
- 12:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es[2042,2045].codfw.wmnet with reason: Maintenance
- 12:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2042 es2045 T395241', diff saved to https://phabricator.wikimedia.org/P77147 and previous config saved to /var/cache/conftool/dbconfig/20250605-122035-marostegui.json
- 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P77146 and previous config saved to /var/cache/conftool/dbconfig/20250605-121029-fceratto.json
- 11:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77145 and previous config saved to /var/cache/conftool/dbconfig/20250605-115522-fceratto.json
- 11:49 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:48 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T395241)', diff saved to https://phabricator.wikimedia.org/P77144 and previous config saved to /var/cache/conftool/dbconfig/20250605-114711-fceratto.json
- 11:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 11:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 11:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T395241)', diff saved to https://phabricator.wikimedia.org/P77143 and previous config saved to /var/cache/conftool/dbconfig/20250605-114213-fceratto.json
- 11:35 moritzm: installing Linux 5.10.237 on Bullseye hosts
- 11:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P77142 and previous config saved to /var/cache/conftool/dbconfig/20250605-112706-fceratto.json
- 11:26 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
- 11:25 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
- 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77141 and previous config saved to /var/cache/conftool/dbconfig/20250605-112518-root.json
- 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77140 and previous config saved to /var/cache/conftool/dbconfig/20250605-112511-root.json
- 11:19 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
- 11:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P77139 and previous config saved to /var/cache/conftool/dbconfig/20250605-111158-fceratto.json
- 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77138 and previous config saved to /var/cache/conftool/dbconfig/20250605-111013-root.json
- 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77137 and previous config saved to /var/cache/conftool/dbconfig/20250605-111005-root.json
- 11:09 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
- 11:04 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
- 11:03 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)
- 11:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
- 11:02 gehel: restarting Blazegraph on wdqs1023 to address allocator decreasing alert
- 10:57 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
- 10:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T395241)', diff saved to https://phabricator.wikimedia.org/P77136 and previous config saved to /var/cache/conftool/dbconfig/20250605-105650-fceratto.json
- 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77135 and previous config saved to /var/cache/conftool/dbconfig/20250605-105507-root.json
- 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77134 and previous config saved to /var/cache/conftool/dbconfig/20250605-105500-root.json
- 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77133 and previous config saved to /var/cache/conftool/dbconfig/20250605-105216-root.json
- 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T395241)', diff saved to https://phabricator.wikimedia.org/P77132 and previous config saved to /var/cache/conftool/dbconfig/20250605-104928-fceratto.json
- 10:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
- 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77131 and previous config saved to /var/cache/conftool/dbconfig/20250605-104912-fceratto.json
- 10:42 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
- 10:41 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2030.codfw.wmnet
- 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77130 and previous config saved to /var/cache/conftool/dbconfig/20250605-104002-root.json
- 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77129 and previous config saved to /var/cache/conftool/dbconfig/20250605-103954-root.json
- 10:39 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1165.eqiad.wmnet
- 10:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77128 and previous config saved to /var/cache/conftool/dbconfig/20250605-103711-root.json
- 10:34 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
- 10:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P77127 and previous config saved to /var/cache/conftool/dbconfig/20250605-103403-fceratto.json
- 10:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
- 10:32 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1165.eqiad.wmnet
- 10:31 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1164.eqiad.wmnet
- 10:31 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 10:30 Ammar: Ran fixStuckGlobalRename.php for T396054
- 10:27 claime: Manual run of generatecaptcha on mw-cron with delete - T388531
- 10:26 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 10:26 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77126 and previous config saved to /var/cache/conftool/dbconfig/20250605-102456-root.json
- 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77125 and previous config saved to /var/cache/conftool/dbconfig/20250605-102449-root.json
- 10:24 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1164.eqiad.wmnet
- 10:23 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1163.eqiad.wmnet
- 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2041 to es4 master and es2044 as es5 master', diff saved to https://phabricator.wikimedia.org/P77124 and previous config saved to /var/cache/conftool/dbconfig/20250605-102319-root.json
- 10:22 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2029.codfw.wmnet
- 10:22 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
- 10:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77123 and previous config saved to /var/cache/conftool/dbconfig/20250605-102205-root.json
- 10:21 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 10:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P77122 and previous config saved to /var/cache/conftool/dbconfig/20250605-101856-fceratto.json
- 10:16 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
- 10:16 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1163.eqiad.wmnet
- 10:09 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
- 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'es2046 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77121 and previous config saved to /var/cache/conftool/dbconfig/20250605-100950-root.json
- 10:09 marostegui@cumin1002: dbctl commit (dc=all): 'es2043 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77120 and previous config saved to /var/cache/conftool/dbconfig/20250605-100943-root.json
- 10:08 cgoubert@deploy1003: Finished scap sync-world: Backport for Revert "ores-extension: enable extension with revertrisk filter for second batch of wikis" (duration: 10m 36s)
- 10:08 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
- 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77119 and previous config saved to /var/cache/conftool/dbconfig/20250605-100700-root.json
- 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77118 and previous config saved to /var/cache/conftool/dbconfig/20250605-100527-root.json
- 10:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es[2043,2046].codfw.wmnet with reason: Maintenance
- 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2043 es2046 T395241', diff saved to https://phabricator.wikimedia.org/P77117 and previous config saved to /var/cache/conftool/dbconfig/20250605-100419-marostegui.json
- 10:04 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@d11bd51]: Update webrequest-test hive jar for ua-parser (duration: 00m 16s)
- 10:03 joal@deploy1003: Started deploy [airflow-dags/analytics_test@d11bd51]: Update webrequest-test hive jar for ua-parser
- 10:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77116 and previous config saved to /var/cache/conftool/dbconfig/20250605-100349-fceratto.json
- 10:03 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
- 10:01 cgoubert@deploy1003: gkyziridis, cgoubert: Continuing with sync
- 10:00 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
- 10:00 cgoubert@deploy1003: gkyziridis, cgoubert: Backport for Revert "ores-extension: enable extension with revertrisk filter for second batch of wikis" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:58 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
- 09:58 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1007.eqiad.wmnet
- 09:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
- 09:57 cgoubert@deploy1003: Started scap sync-world: Backport for Revert "ores-extension: enable extension with revertrisk filter for second batch of wikis"
- 09:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77115 and previous config saved to /var/cache/conftool/dbconfig/20250605-095415-fceratto.json
- 09:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 09:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T395241)', diff saved to https://phabricator.wikimedia.org/P77114 and previous config saved to /var/cache/conftool/dbconfig/20250605-095347-fceratto.json
- 09:53 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 09:53 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
- 09:52 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
- 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77113 and previous config saved to /var/cache/conftool/dbconfig/20250605-095155-root.json
- 09:50 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1007.eqiad.wmnet
- 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77112 and previous config saved to /var/cache/conftool/dbconfig/20250605-095022-root.json
- 09:44 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
- 09:44 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:43 claime: Re-enabling CPU/RAM limits on mw-cron - T395436
- 09:43 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P77111 and previous config saved to /var/cache/conftool/dbconfig/20250605-093840-fceratto.json
- 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77110 and previous config saved to /var/cache/conftool/dbconfig/20250605-093649-root.json
- 09:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:36 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
- 09:36 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
- 09:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:35 hnowlan: Migrate reading lists API out of restbase for group1 via rest-gateway
- 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77109 and previous config saved to /var/cache/conftool/dbconfig/20250605-093515-root.json
- 09:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2158 T395989', diff saved to https://phabricator.wikimedia.org/P77108 and previous config saved to /var/cache/conftool/dbconfig/20250605-093107-marostegui.json
- 09:30 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
- 09:26 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
- 09:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add netflow7002 - jmm@cumin1003"
- 09:24 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2026.codfw.wmnet
- 09:24 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add netflow7002 - jmm@cumin1003"
- 09:24 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
- 09:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P77107 and previous config saved to /var/cache/conftool/dbconfig/20250605-092333-fceratto.json
- 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77106 and previous config saved to /var/cache/conftool/dbconfig/20250605-092010-root.json
- 09:18 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
- 09:17 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
- 09:13 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
- 09:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T395241)', diff saved to https://phabricator.wikimedia.org/P77105 and previous config saved to /var/cache/conftool/dbconfig/20250605-090825-fceratto.json
- 09:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77104 and previous config saved to /var/cache/conftool/dbconfig/20250605-090504-root.json
- 08:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T395241)', diff saved to https://phabricator.wikimedia.org/P77103 and previous config saved to /var/cache/conftool/dbconfig/20250605-085847-fceratto.json
- 08:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 08:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T395241)', diff saved to https://phabricator.wikimedia.org/P77102 and previous config saved to /var/cache/conftool/dbconfig/20250605-085820-fceratto.json
- 08:53 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2026.codfw.wmnet
- 08:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
- 08:50 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
- 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77101 and previous config saved to /var/cache/conftool/dbconfig/20250605-084959-root.json
- 08:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
- 08:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2169 T395989', diff saved to https://phabricator.wikimedia.org/P77100 and previous config saved to /var/cache/conftool/dbconfig/20250605-084557-marostegui.json
- 08:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
- 08:43 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
- 08:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P77099 and previous config saved to /var/cache/conftool/dbconfig/20250605-084313-fceratto.json
- 08:35 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:34 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:32 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:31 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
- 08:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P77098 and previous config saved to /var/cache/conftool/dbconfig/20250605-082806-fceratto.json
- 08:24 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netflow7002.magru.wmnet
- 08:24 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow7002.magru.wmnet with OS bookworm
- 08:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:24 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 08:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T395241)', diff saved to https://phabricator.wikimedia.org/P77097 and previous config saved to /var/cache/conftool/dbconfig/20250605-081258-fceratto.json
- 08:10 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
- 08:09 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
- 08:06 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow7002.magru.wmnet with reason: host reimage
- 08:03 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow7002.magru.wmnet with reason: host reimage
- 08:03 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
- 08:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T395241)', diff saved to https://phabricator.wikimedia.org/P77096 and previous config saved to /var/cache/conftool/dbconfig/20250605-080310-fceratto.json
- 08:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 08:00 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 08:00 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 07:56 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
- 07:50 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
- 07:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
- 07:44 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:43 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:43 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
- 07:38 gkyziridis@deploy1003: Sync cancelled.
- 07:36 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host netflow7002.magru.wmnet with OS bookworm
- 07:36 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
- 07:32 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
- 07:32 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
- 07:27 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow7002.magru.wmnet - jmm@cumin1003"
- 07:27 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow7002.magru.wmnet - jmm@cumin1003"
- 07:26 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow7002.magru.wmnet on all recursors
- 07:26 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache netflow7002.magru.wmnet on all recursors
- 07:26 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:26 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow7002.magru.wmnet - jmm@cumin1003"
- 07:26 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow7002.magru.wmnet - jmm@cumin1003"
- 07:26 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
- 07:23 gkyziridis@deploy1003: gkyziridis: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (T395823) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:22 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 07:22 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host netflow7002.magru.wmnet
- 07:21 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: enable extension with revertrisk filter for second batch of wikis (T395823)
- 07:19 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
- 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77095 and previous config saved to /var/cache/conftool/dbconfig/20250605-064137-root.json
- 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77094 and previous config saved to /var/cache/conftool/dbconfig/20250605-062629-root.json
- 06:19 marostegui: Change datadir on pc8 dbmaint eqiad codfw T395983
- 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc8 T395983', diff saved to https://phabricator.wikimedia.org/P77093 and previous config saved to /var/cache/conftool/dbconfig/20250605-061929-marostegui.json
- 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc8 T395983', diff saved to https://phabricator.wikimedia.org/P77092 and previous config saved to /var/cache/conftool/dbconfig/20250605-061612-marostegui.json
- 06:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2018.codfw.wmnet,pc1018.eqiad.wmnet with reason: Maintenance
- 06:15 marostegui: Change datadir on pc7 dbmaint eqiad codfw T395983
- 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc7 T395983', diff saved to https://phabricator.wikimedia.org/P77091 and previous config saved to /var/cache/conftool/dbconfig/20250605-061502-marostegui.json
- 06:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc7 T395983', diff saved to https://phabricator.wikimedia.org/P77090 and previous config saved to /var/cache/conftool/dbconfig/20250605-061200-marostegui.json
- 06:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2017.codfw.wmnet,pc1017.eqiad.wmnet with reason: Maintenance
- 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77089 and previous config saved to /var/cache/conftool/dbconfig/20250605-061124-root.json
- 05:57 marostegui: Change datadir on pc6 dbmaint eqiad codfw T395983
- 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc6 T395983', diff saved to https://phabricator.wikimedia.org/P77088 and previous config saved to /var/cache/conftool/dbconfig/20250605-055655-marostegui.json
- 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77087 and previous config saved to /var/cache/conftool/dbconfig/20250605-055619-root.json
- 05:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2016.codfw.wmnet,pc1016.eqiad.wmnet with reason: Maintenance
- 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc6 T395983', diff saved to https://phabricator.wikimedia.org/P77086 and previous config saved to /var/cache/conftool/dbconfig/20250605-055438-marostegui.json
- 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc5 T395983', diff saved to https://phabricator.wikimedia.org/P77085 and previous config saved to /var/cache/conftool/dbconfig/20250605-055349-marostegui.json
- 05:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2015.codfw.wmnet,pc1015.eqiad.wmnet with reason: Maintenance
- 05:51 marostegui: Change datadir on pc5 dbmaint eqiad codfw T395983
- 05:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc5 T395983', diff saved to https://phabricator.wikimedia.org/P77084 and previous config saved to /var/cache/conftool/dbconfig/20250605-055121-marostegui.json
- 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc4 T395983', diff saved to https://phabricator.wikimedia.org/P77083 and previous config saved to /var/cache/conftool/dbconfig/20250605-055013-marostegui.json
- 05:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2014.codfw.wmnet,pc1014.eqiad.wmnet with reason: Maintenance
- 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc4 T395983', diff saved to https://phabricator.wikimedia.org/P77082 and previous config saved to /var/cache/conftool/dbconfig/20250605-054806-marostegui.json
- 05:47 marostegui: Change datadir on pc4 dbmaint eqiad codfw T395983
- 05:47 marostegui: Change datadir on pc3 dbmaint eqiad codfw T395983
- 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77081 and previous config saved to /var/cache/conftool/dbconfig/20250605-054113-root.json
- 05:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc3 T395983', diff saved to https://phabricator.wikimedia.org/P77080 and previous config saved to /var/cache/conftool/dbconfig/20250605-053647-marostegui.json
- 05:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2013.codfw.wmnet,pc1013.eqiad.wmnet with reason: Maintenance
- 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc3 T395983', diff saved to https://phabricator.wikimedia.org/P77079 and previous config saved to /var/cache/conftool/dbconfig/20250605-053317-marostegui.json
- 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc2 T395983', diff saved to https://phabricator.wikimedia.org/P77078 and previous config saved to /var/cache/conftool/dbconfig/20250605-052934-marostegui.json
- 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77077 and previous config saved to /var/cache/conftool/dbconfig/20250605-052604-root.json
- 05:25 marostegui: Change datadir on pc2 dbmaint eqiad codfw T395983
- 05:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2012.codfw.wmnet,pc1012.eqiad.wmnet with reason: Maintenance
- 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc2 T395983', diff saved to https://phabricator.wikimedia.org/P77076 and previous config saved to /var/cache/conftool/dbconfig/20250605-052442-marostegui.json
- 05:20 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2180 T395989', diff saved to https://phabricator.wikimedia.org/P77075 and previous config saved to /var/cache/conftool/dbconfig/20250605-052003-marostegui.json
2025-06-04
- 23:55 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir
- 22:45 brett@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
- 22:30 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7016.magru.wmnet
- 22:27 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 22:20 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7014.magru.wmnet
- 22:18 damilare: SmashPig upgraded from d08693e5 to 3222a1f3
- 22:16 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump cache key version in EventStore (T396075) (duration: 13m 54s)
- 22:12 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7016.magru.wmnet
- 22:12 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7015.magru.wmnet
- 22:12 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7015.magru.wmnet
- 22:11 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7015.magru.wmnet
- 22:11 brett: sudo -i cumin 'A:ncredir' 'depool && apt-get update && apt-get upgrade -y && pool' -b1 -s10
- 22:09 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 22:04 ladsgroup@deploy1003: ladsgroup: Backport for Bump cache key version in EventStore (T396075) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:02 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump cache key version in EventStore (T396075)
- 22:02 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7015.magru.wmnet
- 22:02 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7014.magru.wmnet
- 22:02 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7013.magru.wmnet
- 21:58 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7012.magru.wmnet
- 21:43 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7013.magru.wmnet
- 21:42 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7011.magru.wmnet
- 21:40 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7012.magru.wmnet
- 21:40 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7010.magru.wmnet
- 21:39 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7010.magru.wmnet
- 21:35 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7010.magru.wmnet
- 21:29 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250526/ using stat1011.eqiad.wmnet)
- 21:25 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7011.magru.wmnet
- 21:25 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7010.magru.wmnet
- 21:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250526/ using stat1009.eqiad.wmnet)
- 21:22 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7009.magru.wmnet
- 21:14 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7008.magru.wmnet
- 21:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 21:06 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:05 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7009.magru.wmnet
- 21:05 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7008.magru.wmnet
- 21:04 cjming: end of UTC late backport window
- 21:04 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7007.magru.wmnet
- 21:02 cjming@deploy1003: Finished scap sync-world: Backport for SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure⌠(follow-ups) (T390784), SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure⌠(follow-ups) (T390784) (d
- 21:01 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 20:55 cjming@deploy1003: matmarex, cjming: Continuing with sync
- 20:55 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:54 cjming@deploy1003: matmarex, cjming: Backport for SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure⌠(follow-ups) (T390784), SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure⌠(follow-ups) (T390784) synced to
- 20:51 cjming@deploy1003: Started scap sync-world: Backport for SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure⌠(follow-ups) (T390784), SUL3: Retry local login on failure due to invalid/expired login token (T390784), SUL3: Retry local login on failure⌠(follow-ups) (T390784)
- 20:51 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 20:50 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7006.magru.wmnet
- 20:46 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7007.magru.wmnet
- 20:44 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp7005.magru.wmnet
- 20:40 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7006.magru.wmnet
- 20:38 cjming@deploy1003: Finished scap sync-world: Backport for Treat File::getShortDesc() as possibly unsafe HTML (T395834), Treat File::getShortDesc() as possibly unsafe HTML (T395834) (duration: 15m 37s)
- 20:37 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7004.magru.wmnet
- 20:31 cjming@deploy1003: cjming, matmarex: Continuing with sync
- 20:26 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7005.magru.wmnet
- 20:26 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7004.magru.wmnet
- 20:25 cjming@deploy1003: cjming, matmarex: Backport for Treat File::getShortDesc() as possibly unsafe HTML (T395834), Treat File::getShortDesc() as possibly unsafe HTML (T395834) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:23 cjming@deploy1003: Started scap sync-world: Backport for Treat File::getShortDesc() as possibly unsafe HTML (T395834), Treat File::getShortDesc() as possibly unsafe HTML (T395834)
- 20:20 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7003.magru.wmnet
- 20:18 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7002.magru.wmnet
- 20:15 cjming@deploy1003: Finished scap sync-world: Backport for beta cluster: Set $wgOATHAuthAccountPrefix (T396061) (duration: 10m 13s)
- 20:10 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7003.magru.wmnet
- 20:09 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7002.magru.wmnet
- 20:08 cjming@deploy1003: lucaswerkmeister, cjming: Continuing with sync
- 20:07 cjming@deploy1003: lucaswerkmeister, cjming: Backport for beta cluster: Set $wgOATHAuthAccountPrefix (T396061) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:05 cjming@deploy1003: Started scap sync-world: Backport for beta cluster: Set $wgOATHAuthAccountPrefix (T396061)
- 20:03 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp3072.esams.wmnet
- 19:54 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp3072.esams.wmnet
- 19:42 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp7001.magru.wmnet
- 19:36 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 19:31 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp7001.magru.wmnet
- 19:23 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 19:22 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 19:13 sukhe@dns1004: END - running authdns-update
- 19:12 sukhe@dns1004: START - running authdns-update
- 19:11 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7002.wikimedia.org [reason: repooling after reboot]
- 19:11 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7002.magru.wmnet [reason: repooling after reboot]
- 19:11 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7002.wikimedia.org
- 19:11 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7002.wikimedia.org
- 19:10 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-reboot (exit_code=0) rolling reboot on A:dnsbox and A:magru and (A:dnsbox)
- 19:10 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7002.wikimedia.org
- 19:10 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 19:10 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 19:10 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 19:10 dreamyjazz@deploy1003: Finished scap sync-world: Backport for CustomBlockedDomainStorage::fetchConfig: Cast LinkTarget to a Title for RevisionStore (T396056) (duration: 12m 27s)
- 19:10 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 19:09 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 19:09 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 19:08 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 19:03 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
- 19:00 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 19:00 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 19:00 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 18:59 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 18:59 dreamyjazz@deploy1003: dreamyjazz: Backport for CustomBlockedDomainStorage::fetchConfig: Cast LinkTarget to a Title for RevisionStore (T396056) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:57 dreamyjazz@deploy1003: Started scap sync-world: Backport for CustomBlockedDomainStorage::fetchConfig: Cast LinkTarget to a Title for RevisionStore (T396056)
- 18:55 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7002.wikimedia.org
- 18:51 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.*
- 18:49 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 18:45 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot finished rebooting dns7001.wikimedia.org
- 18:37 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.*
- 18:37 brett: depooling cp7001 for CPU stress testing and temperature effects (T373993)
- 18:29 sukhe@cumin1002: cookbooks.sre.dns.roll-reboot begin reboot of dns7001.wikimedia.org
- 18:29 sukhe@cumin1002: START - Cookbook sre.dns.roll-reboot rolling reboot on A:dnsbox and A:magru and (A:dnsbox)
- 18:24 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 18:24 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 18:18 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.4 refs T392174
- 18:16 damilare: SmashPig upgraded from a99f2265 to d08693e5
- 18:15 sukhe: puppet re-enabled on A:cp and finished rolling out removal of ats-be from single backend cp nodes: T288106
- 18:14 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 18:13 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 18:03 bvibber@deploy1003: Finished scap sync-world: Backport for Update Charts so they can render from data-mw-charts as well as data-charts (T395462), Update Charts so they can render from data-mw-charts as well as data-charts (T395462) (duration: 10m 05s)
- 17:56 bvibber@deploy1003: bvibber: Continuing with sync
- 17:55 bvibber@deploy1003: bvibber: Backport for Update Charts so they can render from data-mw-charts as well as data-charts (T395462), Update Charts so they can render from data-mw-charts as well as data-charts (T395462) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:53 bvibber@deploy1003: Started scap sync-world: Backport for Update Charts so they can render from data-mw-charts as well as data-charts (T395462), Update Charts so they can render from data-mw-charts as well as data-charts (T395462)
- 17:50 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: sync
- 17:49 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: sync
- 17:48 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:48 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:35 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:15 cgoubert@deploy1003: Finished scap sync-world: 1153647: mediawiki: Fix captcha configmap structure - T388531 (duration: 02m 39s)
- 17:13 cgoubert@deploy1003: Started scap sync-world: 1153647: mediawiki: Fix captcha configmap structure - T388531
- 16:41 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1185.eqiad.wmnet with OS bullseye
- 16:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:32 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:30 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1186
- 16:30 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1186
- 16:27 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1165.eqiad.wmnet
- 16:27 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1165.eqiad.wmnet
- 16:22 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:22 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
- 16:22 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1186 - vriley@cumin1002"
- 16:19 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1164.eqiad.wmnet
- 16:18 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1164.eqiad.wmnet
- 16:18 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1165.eqiad.wmnet
- 16:16 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1165.eqiad.wmnet
- 16:14 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 16:14 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1164.eqiad.wmnet
- 16:13 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1164.eqiad.wmnet
- 16:13 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:12 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1164.eqiad.wmnet
- 16:12 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum and not A:magru and A:durum
- 16:12 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:11 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and not A:magru and A:wikidough
- 16:10 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1164.eqiad.wmnet
- 16:07 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1163.eqiad.wmnet
- 16:04 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 16:04 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 16:03 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1163.eqiad.wmnet
- 16:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T395241)', diff saved to https://phabricator.wikimedia.org/P77074 and previous config saved to /var/cache/conftool/dbconfig/20250604-160120-fceratto.json
- 15:56 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:56 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1186.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P77073 and previous config saved to /var/cache/conftool/dbconfig/20250604-154611-fceratto.json
- 15:42 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
- 15:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P77072 and previous config saved to /var/cache/conftool/dbconfig/20250604-153104-fceratto.json
- 15:30 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
- 15:29 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
- 15:24 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Set wgCheckUserDisableCheckUserAPI to false on loginwiki (T396010) (duration: 10m 03s)
- 15:20 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1185.eqiad.wmnet with OS bullseye
- 15:20 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
- 15:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:18 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:17 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
- 15:16 dreamyjazz@deploy1003: dreamyjazz: Backport for Set wgCheckUserDisableCheckUserAPI to false on loginwiki (T396010) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T395241)', diff saved to https://phabricator.wikimedia.org/P77071 and previous config saved to /var/cache/conftool/dbconfig/20250604-151556-fceratto.json
- 15:15 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum and not A:magru and A:durum
- 15:14 dreamyjazz@deploy1003: Started scap sync-world: Backport for Set wgCheckUserDisableCheckUserAPI to false on loginwiki (T396010)
- 15:14 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and not A:magru and A:wikidough
- 15:14 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:08 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T395241)', diff saved to https://phabricator.wikimedia.org/P77070 and previous config saved to /var/cache/conftool/dbconfig/20250604-150740-fceratto.json
- 15:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T395241)', diff saved to https://phabricator.wikimedia.org/P77069 and previous config saved to /var/cache/conftool/dbconfig/20250604-150716-fceratto.json
- 15:05 jiji@deploy1003: Finished scap sync-world: T276994: Chart bump, noop (duration: 02m 52s)
- 15:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:02 jiji@deploy1003: Started scap sync-world: T276994: Chart bump, noop
- 15:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:00 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 15:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:00 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1163.eqiad.wmnet
- 14:58 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1163.eqiad.wmnet
- 14:58 moritzm: installing Linux 5.10.237 on Bullseye hosts
- 14:55 cmooney@dns2005: END - running authdns-update
- 14:54 cmooney@dns2005: START - running authdns-update
- 14:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P77068 and previous config saved to /var/cache/conftool/dbconfig/20250604-145209-fceratto.json
- 14:50 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove DNS entries for IPs used in Nokia test lab codfw - cmooney@cumin1002"
- 14:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove DNS entries for IPs used in Nokia test lab codfw - cmooney@cumin1002"
- 14:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1094.eqiad.wmnet with OS bullseye
- 14:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:47 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
- 14:47 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
- 14:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:46 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
- 14:46 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 14:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1095.eqiad.wmnet with OS bullseye
- 14:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:43 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
- 14:41 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
- 14:38 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:37 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P77067 and previous config saved to /var/cache/conftool/dbconfig/20250604-143702-fceratto.json
- 14:36 cgoubert@deploy1003: Finished scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531 (duration: 02m 33s)
- 14:33 cgoubert@deploy1003: Started scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531
- 14:32 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:32 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:31 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
- 14:31 cgoubert@deploy1003: Finished scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531 (duration: 02m 24s)
- 14:29 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
- 14:29 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
- 14:28 cgoubert@deploy1003: Started scap sync-world: 1153634: mediawiki: Fix captcha wordlists path - T388531
- 14:28 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 14:27 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 14:27 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 14:26 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 14:26 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 14:25 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 14:24 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:24 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh7001.wikimedia.org
- 14:24 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:24 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
- 14:23 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh7001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
- 14:23 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 14:23 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
- 14:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T395241)', diff saved to https://phabricator.wikimedia.org/P77066 and previous config saved to /var/cache/conftool/dbconfig/20250604-142155-fceratto.json
- 14:21 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:21 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 14:20 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 14:19 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum7001.magru.wmnet
- 14:19 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:19 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
- 14:19 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: durum7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin1002"
- 14:19 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 14:19 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
- 14:17 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 14:16 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 14:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
- 14:13 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
- 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T395241)', diff saved to https://phabricator.wikimedia.org/P77065 and previous config saved to /var/cache/conftool/dbconfig/20250604-141238-fceratto.json
- 14:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 14:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T395241)', diff saved to https://phabricator.wikimedia.org/P77064 and previous config saved to /var/cache/conftool/dbconfig/20250604-141213-fceratto.json
- 14:11 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 14:11 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 14:08 sukhe: decommissioning doh7001 and durum7001: T396015
- 14:07 sukhe@cumin1002: START - Cookbook sre.hosts.decommission for hosts doh7001.wikimedia.org
- 14:07 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:07 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:07 sukhe@cumin1002: START - Cookbook sre.hosts.decommission for hosts durum7001.magru.wmnet
- 14:06 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:06 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:05 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:05 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 14:04 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 14:02 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:02 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
- 13:58 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
- 13:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P77062 and previous config saved to /var/cache/conftool/dbconfig/20250604-135706-fceratto.json
- 13:56 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1094.eqiad.wmnet with reason: host reimage
- 13:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2010-dev.codfw.wmnet with reason: host reimage
- 13:51 claime: Manual run of generatecaptcha on mw-cron, no delete - T388531
- 13:51 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 13:50 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 13:49 sukhe: sudo cumin -b1 -s15 'A:cp' 'run-puppet-agent --enable "merging CR 1114074"': T288106
- 13:48 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 13:48 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs1013.eqiad.wmnet} and A:liberica
- 13:47 tappof@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host grafana2001.codfw.wmnet
- 13:46 sukhe: forcing ats-backend-restart on cp1104
- 13:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1095.eqiad.wmnet with reason: host reimage
- 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77061 and previous config saved to /var/cache/conftool/dbconfig/20250604-134336-root.json
- 13:43 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
- 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P77060 and previous config saved to /var/cache/conftool/dbconfig/20250604-134158-fceratto.json
- 13:41 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host grafana2001.codfw.wmnet
- 13:41 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
- 13:40 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host grafana2001.codfw.wmnet
- 13:40 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
- 13:40 samtar@deploy1003: Finished scap sync-world: Backport for IS: Undo turning on wgTemplateDataEnableCategoryBrowser for mw.org (T377975) (duration: 09m 57s)
- 13:39 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1095.eqiad.wmnet with reason: host reimage
- 13:38 tappof@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host grafana2001.codfw.wmnet
- 13:38 tappof@cumin1002: START - Cookbook sre.hosts.reboot-single for host grafana2001.codfw.wmnet
- 13:37 sukhe: forcing agent run on cp2037 (non-single BE node): CR 1114074
- 13:37 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1094.eqiad.wmnet with OS bullseye
- 13:33 samtar@deploy1003: samtar: Continuing with sync
- 13:32 samtar@deploy1003: samtar: Backport for IS: Undo turning on wgTemplateDataEnableCategoryBrowser for mw.org (T377975) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:30 sukhe: forcing agent run on cp7001 (single BE node): CR 1114074
- 13:30 samtar@deploy1003: Started scap sync-world: Backport for IS: Undo turning on wgTemplateDataEnableCategoryBrowser for mw.org (T377975)
- 13:29 sukhe: forcing agent run on cp6015: CR 1114074
- 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77059 and previous config saved to /var/cache/conftool/dbconfig/20250604-132829-root.json
- 13:27 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1013.eqiad.wmnet
- 13:27 vgutierrez@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1013.eqiad.wmnet
- 13:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T395241)', diff saved to https://phabricator.wikimedia.org/P77058 and previous config saved to /var/cache/conftool/dbconfig/20250604-132648-fceratto.json
- 13:23 sukhe: starting removal of ats-be service from eqiad, eqsin, esams, magru, ulsfo: T288106
- 13:21 sukhe: sudo cumin 'A:cp' 'disable-puppet "merging CR 1114074"'
- 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1095.eqiad.wmnet with OS bullseye
- 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T395241)', diff saved to https://phabricator.wikimedia.org/P77057 and previous config saved to /var/cache/conftool/dbconfig/20250604-131852-fceratto.json
- 13:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
- 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T395241)', diff saved to https://phabricator.wikimedia.org/P77056 and previous config saved to /var/cache/conftool/dbconfig/20250604-131827-fceratto.json
- 13:14 jforrester@deploy1003: Finished scap sync-world: Backport for release CampaignEvents to cbk-zam wiki (T393604), Bump portals to the 2025-06-02 09:23:11+00:00 build (T128546), build: Rename the rarely-used 'typos' script to 'checkTypos', Drop Chart roll-out dblists, no longer needed (T383079) (duration: 10m 29s)
- 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77055 and previous config saved to /var/cache/conftool/dbconfig/20250604-131323-root.json
- 13:11 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
- 13:11 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
- 13:08 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
- 13:07 jforrester@deploy1003: jforrester, mhorsey: Continuing with sync
- 13:06 jforrester@deploy1003: jforrester, mhorsey: Backport for release CampaignEvents to cbk-zam wiki (T393604), Bump portals to the 2025-06-02 09:23:11+00:00 build (T128546), build: Rename the rarely-used 'typos' script to 'checkTypos', Drop Chart roll-out dblists, no longer needed (T383079) synced to the testservers (see https://wikitech.wikimedia
- 13:04 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 13:04 jforrester@deploy1003: Started scap sync-world: Backport for release CampaignEvents to cbk-zam wiki (T393604), Bump portals to the 2025-06-02 09:23:11+00:00 build (T128546), build: Rename the rarely-used 'typos' script to 'checkTypos', Drop Chart roll-out dblists, no longer needed (T383079)
- 13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P77054 and previous config saved to /var/cache/conftool/dbconfig/20250604-130319-fceratto.json
- 13:03 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 13:02 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 13:02 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 13:02 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 13:01 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 13:01 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
- 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77053 and previous config saved to /var/cache/conftool/dbconfig/20250604-125817-root.json
- 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77051 and previous config saved to /var/cache/conftool/dbconfig/20250604-124311-root.json
- 12:43 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1004.eqiad.wmnet
- 12:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1003.eqiad.wmnet
- 12:39 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
- 12:39 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm
- 12:39 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
- 12:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-be1095.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:36 moritzm: installing modsecurity-apache security updates
- 12:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-be1094.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:35 andrew@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1003.eqiad.wmnet
- 12:35 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:35 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for ms-be1094/95 - jclark@cumin1002"
- 12:35 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for ms-be1094/95 - jclark@cumin1002"
- 12:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T395241)', diff saved to https://phabricator.wikimedia.org/P77050 and previous config saved to /var/cache/conftool/dbconfig/20250604-123304-fceratto.json
- 12:32 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77049 and previous config saved to /var/cache/conftool/dbconfig/20250604-122948-root.json
- {{safesubst:SAL entry|1=12:28 reedy@deploy1003: Finished scap sync-world: Backport for GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531), captcha.py: Expand variables and user in filenames (T395810), captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804), [[gerrit:1153595|captcha.py: Bail out if no words were rea}}
- 12:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2217 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77048 and previous config saved to /var/cache/conftool/dbconfig/20250604-122806-root.json
- 12:27 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcontrol2010-dev.codfw.wmnet on all recursors
- 12:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache cloudcontrol2010-dev.codfw.wmnet on all recursors
- 12:26 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudcephosd2010-dev.codfw.wmnet on all recursors
- 12:26 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache cloudcephosd2010-dev.codfw.wmnet on all recursors
- 12:25 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 12:25 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:25 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for cloudcontrol2010-dev which had been added on wrong vlan - cmooney@cumin1002"
- 12:25 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for cloudcontrol2010-dev which had been added on wrong vlan - cmooney@cumin1002"
- 12:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T395241)', diff saved to https://phabricator.wikimedia.org/P77047 and previous config saved to /var/cache/conftool/dbconfig/20250604-122436-fceratto.json
- 12:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 12:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T395241)', diff saved to https://phabricator.wikimedia.org/P77046 and previous config saved to /var/cache/conftool/dbconfig/20250604-122411-fceratto.json
- 12:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2217 T395989', diff saved to https://phabricator.wikimedia.org/P77045 and previous config saved to /var/cache/conftool/dbconfig/20250604-122303-marostegui.json
- 12:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 12:21 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 12:21 reedy@deploy1003: reedy: Continuing with sync
- 12:21 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- {{safesubst:SAL entry|1=12:20 reedy@deploy1003: reedy: Backport for GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531), captcha.py: Expand variables and user in filenames (T395810), captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804), [[gerrit:1153595|captcha.py: Bail out if no words were read from wordlist (T3}}
- 12:18 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- {{safesubst:SAL entry|1=12:18 reedy@deploy1003: Started scap sync-world: Backport for GenerateFancyCaptchas: Handle captcha.py not generating any captchas, but not erroring (T388531), captcha.py: Expand variables and user in filenames (T395810), captcha.py: Check if output dir exists, and attempt to create it (else error) (T395804), [[gerrit:1153595|captcha.py: Bail out if no words were read}}
- 12:16 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
- 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77044 and previous config saved to /var/cache/conftool/dbconfig/20250604-121442-root.json
- 12:12 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
- 12:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P77043 and previous config saved to /var/cache/conftool/dbconfig/20250604-120904-fceratto.json
- 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77041 and previous config saved to /var/cache/conftool/dbconfig/20250604-115936-root.json
- 11:58 samtar@deploy1003: Finished scap sync-world: Backport for IS/IS-labs: Enable TemplateDiscovery flags for mediawikiwiki (T377975) (duration: 12m 28s)
- 11:55 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts krb1001.eqiad.wmnet
- 11:55 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:55 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: krb1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 11:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P77040 and previous config saved to /var/cache/conftool/dbconfig/20250604-115357-fceratto.json
- 11:51 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: krb1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin1003"
- 11:51 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm
- 11:51 samtar@deploy1003: samtar: Continuing with sync
- 11:47 samtar@deploy1003: samtar: Backport for IS/IS-labs: Enable TemplateDiscovery flags for mediawikiwiki (T377975) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:45 samtar@deploy1003: Started scap sync-world: Backport for IS/IS-labs: Enable TemplateDiscovery flags for mediawikiwiki (T377975)
- 11:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77038 and previous config saved to /var/cache/conftool/dbconfig/20250604-114430-root.json
- 11:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T395241)', diff saved to https://phabricator.wikimedia.org/P77037 and previous config saved to /var/cache/conftool/dbconfig/20250604-113849-fceratto.json
- 11:38 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 11:35 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 11:35 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 11:35 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 11:34 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 11:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
- 11:34 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
- 11:33 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 11:32 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
- 11:32 jmm@cumin1003: START - Cookbook sre.hosts.decommission for hosts krb1001.eqiad.wmnet
- 11:31 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:31 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T395241)', diff saved to https://phabricator.wikimedia.org/P77036 and previous config saved to /var/cache/conftool/dbconfig/20250604-113030-fceratto.json
- 11:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 11:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T395241)', diff saved to https://phabricator.wikimedia.org/P77035 and previous config saved to /var/cache/conftool/dbconfig/20250604-113005-fceratto.json
- 11:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77034 and previous config saved to /var/cache/conftool/dbconfig/20250604-112923-root.json
- 11:27 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
- 11:25 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:25 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:22 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 11:22 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:20 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
- 11:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P77033 and previous config saved to /var/cache/conftool/dbconfig/20250604-111457-fceratto.json
- 11:14 claime: Deployed k8s-controller-sidecars version 1.0.2-3
- 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2193 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77032 and previous config saved to /var/cache/conftool/dbconfig/20250604-111418-root.json
- 11:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2193 T395989', diff saved to https://phabricator.wikimedia.org/P77031 and previous config saved to /var/cache/conftool/dbconfig/20250604-110955-marostegui.json
- 11:06 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:06 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:06 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:05 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:05 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:04 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:04 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:03 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P77030 and previous config saved to /var/cache/conftool/dbconfig/20250604-105950-fceratto.json
- 10:52 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 10:51 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 10:45 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 10:44 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 10:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T395241)', diff saved to https://phabricator.wikimedia.org/P77029 and previous config saved to /var/cache/conftool/dbconfig/20250604-104443-fceratto.json
- 10:38 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 10:37 jmm@cumin1003: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru03 and group B
- 10:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T395241)', diff saved to https://phabricator.wikimedia.org/P77028 and previous config saved to /var/cache/conftool/dbconfig/20250604-103629-fceratto.json
- 10:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 10:36 moritzm: failover ganeti master in ulsfo to ganeti4005
- 10:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T395241)', diff saved to https://phabricator.wikimedia.org/P77027 and previous config saved to /var/cache/conftool/dbconfig/20250604-103604-fceratto.json
- 10:35 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
- 10:35 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
- 10:35 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
- 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77026 and previous config saved to /var/cache/conftool/dbconfig/20250604-103233-root.json
- 10:29 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
- 10:25 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
- 10:25 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
- 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77025 and previous config saved to /var/cache/conftool/dbconfig/20250604-102351-root.json
- 10:23 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
- 10:23 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
- 10:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P77024 and previous config saved to /var/cache/conftool/dbconfig/20250604-102056-fceratto.json
- 10:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77023 and previous config saved to /var/cache/conftool/dbconfig/20250604-101728-root.json
- 10:17 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
- 10:09 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
- 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77022 and previous config saved to /var/cache/conftool/dbconfig/20250604-100846-root.json
- 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P77021 and previous config saved to /var/cache/conftool/dbconfig/20250604-100547-fceratto.json
- 10:05 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: switching to katran
- 10:04 vgutierrez: upload liberica 0.15 to bookwork-wikimedia (apt.wm.o) - T395228
- 10:02 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77020 and previous config saved to /var/cache/conftool/dbconfig/20250604-100222-root.json
- 10:00 vgutierrez: depool lvs1013 before switching to katran - T395228
- 09:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:59 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:55 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:55 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77018 and previous config saved to /var/cache/conftool/dbconfig/20250604-095340-root.json
- 09:52 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:52 akosiaris: re-deploy changeprop-jobqueue to set the Host HTTP header for submission of all jobs. T395451
- 09:52 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 09:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T395241)', diff saved to https://phabricator.wikimedia.org/P77017 and previous config saved to /var/cache/conftool/dbconfig/20250604-095041-fceratto.json
- 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77016 and previous config saved to /var/cache/conftool/dbconfig/20250604-094715-root.json
- 09:46 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
- 09:46 akosiaris: T395451 deploy mw-jobrunner hot patch for VirtualHost selection, testing out that the single version change will work this time around.
- 09:46 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
- 09:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T395241)', diff saved to https://phabricator.wikimedia.org/P77015 and previous config saved to /var/cache/conftool/dbconfig/20250604-094217-fceratto.json
- 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
- 09:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 09:42 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
- 09:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T395241)', diff saved to https://phabricator.wikimedia.org/P77014 and previous config saved to /var/cache/conftool/dbconfig/20250604-094152-fceratto.json
- 09:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm
- 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P77013 and previous config saved to /var/cache/conftool/dbconfig/20250604-093835-root.json
- 09:37 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 09:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 09:36 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
- 09:33 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
- 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P77011 and previous config saved to /var/cache/conftool/dbconfig/20250604-093251-root.json
- 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77010 and previous config saved to /var/cache/conftool/dbconfig/20250604-093209-root.json
- 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc1 T395983', diff saved to https://phabricator.wikimedia.org/P77009 and previous config saved to /var/cache/conftool/dbconfig/20250604-092819-marostegui.json
- 09:27 marostegui: Move datadir on pc1011 dbmaint pc1 eqiad T395983
- 09:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P77008 and previous config saved to /var/cache/conftool/dbconfig/20250604-092645-fceratto.json
- 09:24 taavi@cumin1002: conftool action : set/weight=100:pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=x3
- 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P77007 and previous config saved to /var/cache/conftool/dbconfig/20250604-092328-root.json
- 09:20 marostegui: Move datadir on pc2011 dbmaint pc1 codfw T395983
- 09:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2011.codfw.wmnet,pc1011.eqiad.wmnet with reason: Maintenance
- 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc1 T395983', diff saved to https://phabricator.wikimedia.org/P77006 and previous config saved to /var/cache/conftool/dbconfig/20250604-091921-marostegui.json
- 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P77005 and previous config saved to /var/cache/conftool/dbconfig/20250604-091745-root.json
- 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'es2044 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77004 and previous config saved to /var/cache/conftool/dbconfig/20250604-091704-root.json
- 09:16 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 09:16 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
- 09:16 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 09:15 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:15 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 09:15 akosiaris: T395451 rollback the host header addition, this is erroring out, returning 404s.
- 09:14 akosiaris: T395451 rollback the host header addition, this is erroring out, returning 3xx.
- 09:14 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:14 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 09:14 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 09:12 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
- 09:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P77003 and previous config saved to /var/cache/conftool/dbconfig/20250604-091138-fceratto.json
- 09:11 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 09:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2044.codfw.wmnet with reason: Maintenance
- 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2044', diff saved to https://phabricator.wikimedia.org/P77002 and previous config saved to /var/cache/conftool/dbconfig/20250604-091041-marostegui.json
- 09:10 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 09:10 moritzm: installing qemu bugfix updates from Bookworm point release
- 09:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f1-codfw
- 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2041 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P77001 and previous config saved to /var/cache/conftool/dbconfig/20250604-090823-root.json
- 09:06 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f1-codfw
- 09:05 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 09:03 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:02 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P77000 and previous config saved to /var/cache/conftool/dbconfig/20250604-090240-root.json
- 09:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2041', diff saved to https://phabricator.wikimedia.org/P76999 and previous config saved to /var/cache/conftool/dbconfig/20250604-090226-marostegui.json
- 09:02 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 09:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2041.codfw.wmnet with reason: Maintenance
- 09:01 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 08:57 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 08:57 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 08:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T395241)', diff saved to https://phabricator.wikimedia.org/P76998 and previous config saved to /var/cache/conftool/dbconfig/20250604-085630-fceratto.json
- 08:53 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 08:53 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 08:51 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm
- 08:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T395241)', diff saved to https://phabricator.wikimedia.org/P76997 and previous config saved to /var/cache/conftool/dbconfig/20250604-084819-fceratto.json
- 08:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
- 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76996 and previous config saved to /var/cache/conftool/dbconfig/20250604-084735-root.json
- 08:42 akosiaris: deploy changeprop-jobqueue to set the Host HTTP header for submission of all jobs. T395451
- 08:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 08:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76995 and previous config saved to /var/cache/conftool/dbconfig/20250604-084231-fceratto.json
- 08:42 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 08:41 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 08:38 moritzm: removing ganeti7001 from magru01 cluster T394263
- 08:38 marostegui: Change s6 eqiad dbmaint to SBR T383795
- 08:38 akosiaris: revoke and clean helm-charts.discovery.wmnet old cergen cert from puppetmaster1001
- 08:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76994 and previous config saved to /var/cache/conftool/dbconfig/20250604-083229-root.json
- 08:28 marostegui: Change s6 codfw dbmaint to SBR T383795
- 08:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P76993 and previous config saved to /var/cache/conftool/dbconfig/20250604-082725-fceratto.json
- 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2224 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76992 and previous config saved to /var/cache/conftool/dbconfig/20250604-081725-root.json
- 08:14 moritzm: removing atlas7001 from magru01 cluster T394263
- 08:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P76991 and previous config saved to /var/cache/conftool/dbconfig/20250604-081219-fceratto.json
- 08:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2224 T395989', diff saved to https://phabricator.wikimedia.org/P76990 and previous config saved to /var/cache/conftool/dbconfig/20250604-081058-marostegui.json
- 08:05 moritzm: installing gcc-12 bugfix updates from Bookworm point releases (includes various run time libraries)
- 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76989 and previous config saved to /var/cache/conftool/dbconfig/20250604-080546-root.json
- 08:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt2002.wikimedia.org
- 08:02 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host apt2002.wikimedia.org
- 07:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76988 and previous config saved to /var/cache/conftool/dbconfig/20250604-075711-fceratto.json
- 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76987 and previous config saved to /var/cache/conftool/dbconfig/20250604-075041-root.json
- 07:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76986 and previous config saved to /var/cache/conftool/dbconfig/20250604-074850-fceratto.json
- 07:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 07:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76985 and previous config saved to /var/cache/conftool/dbconfig/20250604-073921-root.json
- 07:37 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain
- 07:37 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain
- 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76984 and previous config saved to /var/cache/conftool/dbconfig/20250604-073535-root.json
- 07:34 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain
- 07:31 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain
- 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76983 and previous config saved to /var/cache/conftool/dbconfig/20250604-072416-root.json
- 07:23 Emperor: restart swift-object-replicator ms-be2066
- 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76982 and previous config saved to /var/cache/conftool/dbconfig/20250604-072030-root.json
- 07:19 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain
- 07:18 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain
- 07:16 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain
- 07:13 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain
- 07:09 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain
- 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76981 and previous config saved to /var/cache/conftool/dbconfig/20250604-070910-root.json
- 07:08 jmm@cumin1003: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain
- 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76980 and previous config saved to /var/cache/conftool/dbconfig/20250604-070525-root.json
- 07:01 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
- 06:57 jmm@cumin1003: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
- 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76979 and previous config saved to /var/cache/conftool/dbconfig/20250604-065405-root.json
- 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76978 and previous config saved to /var/cache/conftool/dbconfig/20250604-065020-root.json
- 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76977 and previous config saved to /var/cache/conftool/dbconfig/20250604-063900-root.json
- 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76976 and previous config saved to /var/cache/conftool/dbconfig/20250604-063515-root.json
- 06:33 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 06:33 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 06:31 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es7" (duration: 09m 52s)
- 06:24 marostegui@deploy1003: marostegui: Continuing with sync
- 06:24 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes on es7" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2048 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76974 and previous config saved to /var/cache/conftool/dbconfig/20250604-062355-root.json
- 06:21 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es7"
- 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76973 and previous config saved to /var/cache/conftool/dbconfig/20250604-062010-root.json
- 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1035 T395982', diff saved to https://phabricator.wikimedia.org/P76972 and previous config saved to /var/cache/conftool/dbconfig/20250604-060413-marostegui.json
- 06:03 marostegui@dns1006: END - running authdns-update
- 06:03 marostegui@dns1006: START - running authdns-update
- 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1039 to es7 primary T395982', diff saved to https://phabricator.wikimedia.org/P76971 and previous config saved to /var/cache/conftool/dbconfig/20250604-060246-marostegui.json
- 06:02 marostegui: Starting es7 eqiad failover from es1035 to es1039 - T395982
- 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1039 with weight 0 T395982', diff saved to https://phabricator.wikimedia.org/P76970 and previous config saved to /var/cache/conftool/dbconfig/20250604-055744-marostegui.json
- 05:56 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes on es7 (T395982) (duration: 13m 00s)
- 05:49 marostegui@deploy1003: marostegui: Continuing with sync
- 05:45 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes on es7 (T395982) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 05:43 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes on es7 (T395982)
- 05:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395982
- 00:38 eileen: civicrm upgraded from 8eb67a94 to 22171c0b
2025-06-03
- 22:42 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 22:10 eileen: civicrm upgraded from 3b59e784 to 8eb67a94
- 22:09 sbassett@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 21:59 sbassett@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 21:53 tzatziki: removing 4 files for legal compliance
- 21:48 sbassett@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 21:41 tzatziki: removing 2 files for legal compliance
- 21:38 sbassett@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 21:38 sbassett@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 21:28 sbassett@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 21:21 mstyles@deploy1003: Finished scap sync-world: Backport for Revert^2 "OATHAuth: Mark checkuser and suppress as requiring 2FA" (T150898) (duration: 11m 31s)
- 21:18 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 21:14 mstyles@deploy1003: mstyles, sbassett: Continuing with sync
- 21:11 mstyles@deploy1003: mstyles, sbassett: Backport for Revert^2 "OATHAuth: Mark checkuser and suppress as requiring 2FA" (T150898) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:09 mstyles@deploy1003: Started scap sync-world: Backport for Revert^2 "OATHAuth: Mark checkuser and suppress as requiring 2FA" (T150898)
- 21:03 cjming@deploy1003: Finished scap sync-world: Backport for Use default preference if no client preference in auth request (T395957) (duration: 09m 49s)
- 20:56 cjming@deploy1003: matmarex, cjming: Continuing with sync
- 20:55 cjming@deploy1003: matmarex, cjming: Backport for Use default preference if no client preference in auth request (T395957) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:53 cjming@deploy1003: Started scap sync-world: Backport for Use default preference if no client preference in auth request (T395957)
- 20:46 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of es2040.codfw.wmnet onto es2048.codfw.wmnet
- 20:46 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) es2048 gradually with 4 steps - Pool es2048.codfw.wmnet in after cloning
- 20:37 cscott@deploy1003: Finished scap sync-world: Backport for Use ::getContentId() and ::clearContentId() from the Parsoid extension API (duration: 12m 41s)
- 20:31 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
- 20:30 cscott@deploy1003: cscott: Continuing with sync
- 20:27 cscott@deploy1003: cscott: Backport for Use ::getContentId() and ::clearContentId() from the Parsoid extension API synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:25 cscott@deploy1003: Started scap sync-world: Backport for Use ::getContentId() and ::clearContentId() from the Parsoid extension API
- 20:18 cjming@deploy1003: Finished scap sync-world: Backport for Deploy survey to en at twenty percent (T389393) (duration: 11m 18s)
- 20:11 cjming@deploy1003: ksarabia, cjming: Continuing with sync
- 20:09 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1011.eqiad.wmnet)
- 20:08 cjming@deploy1003: ksarabia, cjming: Backport for Deploy survey to en at twenty percent (T389393) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:06 cjming@deploy1003: Started scap sync-world: Backport for Deploy survey to en at twenty percent (T389393)
- 19:37 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1008.eqiad.wmnet)
- 19:37 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1008.eqiad.wmnet)
- 19:36 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1010.eqiad.wmnet)
- 19:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs1020.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf/date=20250526/wiki=wikidata/ using stat1010.eqiad.wmnet)
- 19:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata on wdqs1020.eqiad.wmnet from DumpsSource.NFS (munging data to /srv/wdqs/munged, /srv/wdqs/lex-munged)
- 19:35 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata on wdqs1020.eqiad.wmnet from DumpsSource.NFS (munging data to /srv/wdqs/munged, /srv/wdqs/lex-munged)
- 19:22 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.4 refs T392174
- 19:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1011.eqiad.wmnet)
- 19:08 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1009.eqiad.wmnet)
- 19:07 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1023.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=scholarly_articles/ using stat1009.eqiad.wmnet)
- 19:05 ebernhardson@deploy1003: Finished deploy [wdqs/wdqs@fea7794]: 0.3.157 (duration: 17m 57s)
- 19:00 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/rdf_subgraphs/snapshot=20250526/wiki=wikidata/scope=wikidata_main/ using stat1009.eqiad.wmnet)
- 18:47 ebernhardson@deploy1003: Started deploy [wdqs/wdqs@fea7794]: 0.3.157
- 18:08 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
- 18:00 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
- 17:50 gmodena@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 17:50 gmodena@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich-next: apply
- 17:45 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:45 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:42 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 17:41 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 17:39 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 17:39 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 17:39 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 17:39 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 17:38 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 17:38 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 17:36 swfrench@deploy1003: Finished scap sync-world: Scap test run after revert - T389786 (duration: 02m 10s)
- 17:35 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
- 17:34 swfrench@deploy1003: Started scap sync-world: Scap test run after revert - T389786
- 17:19 swfrench@deploy1003: Started scap sync-world: Scap run to test newly enabled dse-k8s-eqiad deployment - T388761 T389786
- 17:17 bvibber@deploy1003: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 17:16 bvibber@deploy1003: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 17:16 bvibber@deploy1003: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 17:15 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 17:15 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 17:14 bvibber@deploy1003: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 17:14 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 17:13 bvibber@deploy1003: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 17:13 bvibber@deploy1003: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 17:12 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 17:06 cdanis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 17:06 cdanis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 17:01 cdanis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 16:59 cdanis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 16:57 fnegri@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1019.eqiad.wmnet with reason: Debugging stuck queryies T390767
- 16:42 bvibber@deploy1003: Finished scap sync-world: Backport for Fixes: Charts embedded in template rendering in Parsoid (T395462), Fixes: Charts embedded in template rendering in Parsoid (T395462) (duration: 09m 54s)
- 16:38 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 16:35 bvibber@deploy1003: bvibber: Continuing with sync
- 16:35 bvibber@deploy1003: bvibber: Backport for Fixes: Charts embedded in template rendering in Parsoid (T395462), Fixes: Charts embedded in template rendering in Parsoid (T395462) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:34 sukhe@dns1004: END - running authdns-update
- 16:33 sukhe: testing dummy authdns-update to ensure clean run after gc-authdns-git-repo.timer rnu
- 16:33 sukhe@dns1004: START - running authdns-update
- 16:32 bvibber@deploy1003: Started scap sync-world: Backport for Fixes: Charts embedded in template rendering in Parsoid (T395462), Fixes: Charts embedded in template rendering in Parsoid (T395462)
- 16:23 jiji@deploy1003: Finished scap sync-world: T276994: We merged a number of noop patches, sparing deployers the scary diffs (duration: 02m 58s)
- 16:22 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:21 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:20 jiji@deploy1003: Started scap sync-world: T276994: We merged a number of noop patches, sparing deployers the scary diffs
- 16:12 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 16:12 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 16:06 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 16:04 jiji@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 16:04 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 15:57 jiji@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 15:57 jiji@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:55 jiji@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:54 jiji@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:51 jiji@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:51 jiji@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:50 jiji@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:18 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 15:18 moritzm: installing gcc-12 bugfix updates from Bookworm point releases (includes various run time libraries)
- 15:17 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 15:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 15:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T395241)', diff saved to https://phabricator.wikimedia.org/P76966 and previous config saved to /var/cache/conftool/dbconfig/20250603-151552-fceratto.json
- 15:10 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 15:10 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 15:10 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bullseye
- 15:06 hashar: Restarted Gerrit due to issue with replication config | T395887
- 15:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P76964 and previous config saved to /var/cache/conftool/dbconfig/20250603-150045-fceratto.json
- 14:58 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 14:58 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 14:54 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 14:54 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add prometheus7002 - jmm@cumin1003"
- 14:50 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
- 14:46 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host prometheus7002.magru.wmnet with OS bookworm
- 14:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P76963 and previous config saved to /var/cache/conftool/dbconfig/20250603-144538-fceratto.json
- 14:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T395241)', diff saved to https://phabricator.wikimedia.org/P76962 and previous config saved to /var/cache/conftool/dbconfig/20250603-143031-fceratto.json
- 14:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on prometheus7002.magru.wmnet with reason: host reimage
- 14:26 jmm@cumin1003: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: elastic1063.eqiad.wmnet
- 14:23 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on prometheus7002.magru.wmnet with reason: host reimage
- 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T395241)', diff saved to https://phabricator.wikimedia.org/P76961 and previous config saved to /var/cache/conftool/dbconfig/20250603-142314-fceratto.json
- 14:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
- 14:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T395241)', diff saved to https://phabricator.wikimedia.org/P76960 and previous config saved to /var/cache/conftool/dbconfig/20250603-142248-fceratto.json
- 14:19 jmm@cumin1003: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: elastic1103.eqiad.wmnet
- 14:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P76959 and previous config saved to /var/cache/conftool/dbconfig/20250603-140740-fceratto.json
- 14:01 Amir1: dropping term store tables from s8 (T351820)
- 14:01 Amir1: dropping term store tables from s8 (T351802)
- 13:57 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host prometheus7002.magru.wmnet with OS bookworm
- 13:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P76957 and previous config saved to /var/cache/conftool/dbconfig/20250603-135233-fceratto.json
- 13:49 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76956 and previous config saved to /var/cache/conftool/dbconfig/20250603-134935-root.json
- 13:44 phuedx: Disabled the SDS 2.4.11 Synthetic A/A Test in xLab
- 13:42 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 13:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 13:38 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 13:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T395241)', diff saved to https://phabricator.wikimedia.org/P76954 and previous config saved to /var/cache/conftool/dbconfig/20250603-133725-fceratto.json
- 13:34 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76953 and previous config saved to /var/cache/conftool/dbconfig/20250603-133429-root.json
- 13:33 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 13:32 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 13:32 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 13:31 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T395241)', diff saved to https://phabricator.wikimedia.org/P76952 and previous config saved to /var/cache/conftool/dbconfig/20250603-132802-fceratto.json
- 13:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
- 13:27 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 13:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76951 and previous config saved to /var/cache/conftool/dbconfig/20250603-132735-fceratto.json
- 13:26 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 13:22 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 13:21 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 13:20 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2050.codfw.wmnet
- 13:20 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 13:20 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76950 and previous config saved to /var/cache/conftool/dbconfig/20250603-131959-root.json
- 13:19 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76949 and previous config saved to /var/cache/conftool/dbconfig/20250603-131923-root.json
- 13:18 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 13:17 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 13:16 moritzm: installing libavif security updates
- 13:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 13:15 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 13:14 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2050.codfw.wmnet
- 13:14 jgleeson: payments-wiki rolled back from def6c267 to 1a4ef678
- 13:12 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 13:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 13:12 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 13:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P76948 and previous config saved to /var/cache/conftool/dbconfig/20250603-131228-fceratto.json
- 13:11 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 13:11 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:08 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76946 and previous config saved to /var/cache/conftool/dbconfig/20250603-130453-root.json
- 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76945 and previous config saved to /var/cache/conftool/dbconfig/20250603-130418-root.json
- 13:04 marostegui: Shutdown clouddb1016:x3 T390954
- 13:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Setting up x3 T390954
- 12:58 moritzm: uploaded wmf-laptop 1.0.2 to apt.wikimedia.org
- 12:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P76943 and previous config saved to /var/cache/conftool/dbconfig/20250603-125721-fceratto.json
- 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76942 and previous config saved to /var/cache/conftool/dbconfig/20250603-124948-root.json
- 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76941 and previous config saved to /var/cache/conftool/dbconfig/20250603-124913-root.json
- 12:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76940 and previous config saved to /var/cache/conftool/dbconfig/20250603-124214-fceratto.json
- 12:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2049.codfw.wmnet
- 12:35 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2049.codfw.wmnet
- 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76939 and previous config saved to /var/cache/conftool/dbconfig/20250603-123442-root.json
- 12:34 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76938 and previous config saved to /var/cache/conftool/dbconfig/20250603-123407-root.json
- 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T395241)', diff saved to https://phabricator.wikimedia.org/P76937 and previous config saved to /var/cache/conftool/dbconfig/20250603-123357-fceratto.json
- 12:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76936 and previous config saved to /var/cache/conftool/dbconfig/20250603-123331-fceratto.json
- 12:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 12:25 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2048 gradually with 4 steps - Pool es2048.codfw.wmnet in after cloning
- 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76935 and previous config saved to /var/cache/conftool/dbconfig/20250603-121937-root.json
- 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P76934 and previous config saved to /var/cache/conftool/dbconfig/20250603-121902-root.json
- 12:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P76933 and previous config saved to /var/cache/conftool/dbconfig/20250603-121824-fceratto.json
- 12:16 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es7" (duration: 09m 47s)
- 12:15 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 12:12 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 12:10 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:09 marostegui@deploy1003: marostegui: Continuing with sync
- 12:09 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes on es7" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:08 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:07 claime: Launching manual run of recount-categories cronjob - T395745
- 12:06 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es7"
- 12:06 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 12:05 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 12:04 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'es2038 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76931 and previous config saved to /var/cache/conftool/dbconfig/20250603-120431-root.json
- 12:03 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P76930 and previous config saved to /var/cache/conftool/dbconfig/20250603-120356-root.json
- 12:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P76929 and previous config saved to /var/cache/conftool/dbconfig/20250603-120316-fceratto.json
- 12:02 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 12:00 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 11:59 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 11:56 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 11:55 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 11:54 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 11:52 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2038 T395785', diff saved to https://phabricator.wikimedia.org/P76927 and previous config saved to /var/cache/conftool/dbconfig/20250603-115026-marostegui.json
- 11:49 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2039 to es7 primary and set section read-write T395785', diff saved to https://phabricator.wikimedia.org/P76926 and previous config saved to /var/cache/conftool/dbconfig/20250603-114917-marostegui.json
- 11:48 marostegui: Starting es7 codfw failover from es2038 to es2039 - T395785
- 11:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76925 and previous config saved to /var/cache/conftool/dbconfig/20250603-114809-fceratto.json
- 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2039 with weight 0 T395785', diff saved to https://phabricator.wikimedia.org/P76924 and previous config saved to /var/cache/conftool/dbconfig/20250603-114637-marostegui.json
- 11:46 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395785
- 11:46 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 11:41 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2048.codfw.wmnet
- 11:40 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 11:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T395241)', diff saved to https://phabricator.wikimedia.org/P76923 and previous config saved to /var/cache/conftool/dbconfig/20250603-113952-fceratto.json
- 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1039 T395647', diff saved to https://phabricator.wikimedia.org/P76922 and previous config saved to /var/cache/conftool/dbconfig/20250603-113946-marostegui.json
- 11:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 11:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76921 and previous config saved to /var/cache/conftool/dbconfig/20250603-113924-fceratto.json
- 11:39 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes on es7 (T395647) (duration: 09m 56s)
- 11:36 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2048.codfw.wmnet
- 11:33 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 11:32 marostegui@deploy1003: marostegui: Continuing with sync
- 11:31 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes on es7 (T395647) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395785
- 11:29 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes on es7 (T395647)
- 11:25 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2040 gradually with 4 steps - Pool es2040.codfw.wmnet in after cloning
- 11:22 taavi@cumin1002: conftool action : set/weight=100:pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=x3
- 11:21 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es7 T395785
- 11:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P76919 and previous config saved to /var/cache/conftool/dbconfig/20250603-111822-fceratto.json
- 11:09 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 11:03 jgleeson: payments-wiki upgraded from 1a4ef678 to def6c267
- 11:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P76917 and previous config saved to /var/cache/conftool/dbconfig/20250603-110315-fceratto.json
- 10:57 phuedx: Enabling the SDS 2.4.11 Synthetic A/A Test in xLab
- 10:57 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2047.codfw.wmnet
- 10:52 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2047.codfw.wmnet
- 10:49 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2046.codfw.wmnet
- 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76915 and previous config saved to /var/cache/conftool/dbconfig/20250603-104809-fceratto.json
- 10:44 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 10:44 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2046.codfw.wmnet
- 10:43 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 10:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2045.codfw.wmnet
- 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T395241)', diff saved to https://phabricator.wikimedia.org/P76914 and previous config saved to /var/cache/conftool/dbconfig/20250603-104056-fceratto.json
- 10:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
- 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T395241)', diff saved to https://phabricator.wikimedia.org/P76913 and previous config saved to /var/cache/conftool/dbconfig/20250603-104030-fceratto.json
- 10:40 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti2045.codfw.wmnet
- 10:38 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 11 hosts with reason: silence alerts due to down BGP groups on cr2-codfw while PIC is reconfigured
- 10:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P76912 and previous config saved to /var/cache/conftool/dbconfig/20250603-102523-fceratto.json
- 10:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Fix the repool dbctl commit', diff saved to https://phabricator.wikimedia.org/P76911 and previous config saved to /var/cache/conftool/dbconfig/20250603-102517-ladsgroup.json
- 10:18 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 10:13 topranks: drain cr2-codfw traffic to enable PIC port bw rebalence on slot 0 T387504
- 09:59 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus7002.magru.wmnet with OS bookworm
- 09:43 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab-runner1002.eqiad.wmnet with OS bookworm
- 09:25 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
- 09:22 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab-runner1002.eqiad.wmnet with reason: host reimage
- 09:22 elukey: puppet cert destroy {mobileapps,proton,recommendation-api}.discovery.wmnet on puppetmaster1001 - old certs not used anymore
- 09:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:18 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2040 gradually with 4 steps - Pool es2040.codfw.wmnet in after cloning
- 09:18 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P76909 and previous config saved to /var/cache/conftool/dbconfig/20250603-091521-fceratto.json
- 09:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:10 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 09:06 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2008.codfw.wmnet with OS bullseye
- 09:06 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 09:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T395241)', diff saved to https://phabricator.wikimedia.org/P76907 and previous config saved to /var/cache/conftool/dbconfig/20250603-090013-fceratto.json
- 08:59 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 08:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T395241)', diff saved to https://phabricator.wikimedia.org/P76906 and previous config saved to /var/cache/conftool/dbconfig/20250603-085148-fceratto.json
- 08:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 08:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T395241)', diff saved to https://phabricator.wikimedia.org/P76905 and previous config saved to /var/cache/conftool/dbconfig/20250603-085121-fceratto.json
- 08:45 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 08:43 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 08:42 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
- 08:41 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 08:40 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 08:38 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
- 08:37 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 08:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P76904 and previous config saved to /var/cache/conftool/dbconfig/20250603-083614-fceratto.json
- 08:36 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 08:34 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host prometheus7002.magru.wmnet with OS bookworm
- 08:30 jelto@cumin1002: START - Cookbook sre.hosts.reimage for host gitlab-runner1002.eqiad.wmnet with OS bookworm
- 08:22 moritzm: rearm keyholder on cumin1003 following reboot
- 08:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P76903 and previous config saved to /var/cache/conftool/dbconfig/20250603-082107-fceratto.json
- 08:16 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be2008.codfw.wmnet with OS bullseye
- 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1003.eqiad.wmnet
- 08:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1003.eqiad.wmnet
- 08:10 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2009.codfw.wmnet with OS bullseye
- 08:10 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 08:06 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 08:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T395241)', diff saved to https://phabricator.wikimedia.org/P76901 and previous config saved to /var/cache/conftool/dbconfig/20250603-080600-fceratto.json
- 08:04 phuedx: Disabling the SDS 2.4.11 Synthetic A/A/ Test in xLab
- 07:57 phuedx: Enabling the SDS 2.4.11 Synthetic A/A Test in xLab
- 07:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T395241)', diff saved to https://phabricator.wikimedia.org/P76900 and previous config saved to /var/cache/conftool/dbconfig/20250603-075638-fceratto.json
- 07:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 07:56 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus7002.magru.wmnet
- 07:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T395241)', diff saved to https://phabricator.wikimedia.org/P76899 and previous config saved to /var/cache/conftool/dbconfig/20250603-075622-fceratto.json
- 07:56 jmm@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 07:49 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
- 07:49 jmm@cumin1003: END (PASS) - Cookbook sre.misc-clusters.restart-reboot-config-master (exit_code=0) rolling reboot on A:config-master
- 07:46 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 07:46 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host prometheus7002.magru.wmnet
- 07:46 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
- 07:44 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
- 07:44 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
- 07:43 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host bast7002.wikimedia.org
- 07:43 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast7002.wikimedia.org with OS bookworm
- 07:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P76898 and previous config saved to /var/cache/conftool/dbconfig/20250603-074113-fceratto.json
- 07:38 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) config-master.discovery.wmnet. on all recursors
- 07:38 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache config-master.discovery.wmnet. on all recursors
- 07:37 jmm@cumin1003: START - Cookbook sre.misc-clusters.restart-reboot-config-master rolling reboot on A:config-master
- 07:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast7002.wikimedia.org with reason: host reimage
- 07:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P76897 and previous config saved to /var/cache/conftool/dbconfig/20250603-072604-fceratto.json
- 07:25 tchanders@deploy1003: Finished scap sync-world: Backport for Assign IP auto-reveal rights to certain groups (T386492) (duration: 10m 39s)
- 07:23 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be2009.codfw.wmnet with OS bullseye
- 07:23 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on bast7002.wikimedia.org with reason: host reimage
- 07:18 tchanders@deploy1003: tchanders: Continuing with sync
- 07:16 tchanders@deploy1003: tchanders: Backport for Assign IP auto-reveal rights to certain groups (T386492) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:14 tchanders@deploy1003: Started scap sync-world: Backport for Assign IP auto-reveal rights to certain groups (T386492)
- 07:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T395241)', diff saved to https://phabricator.wikimedia.org/P76896 and previous config saved to /var/cache/conftool/dbconfig/20250603-071057-fceratto.json
- 07:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:06 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:02 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:02 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76895 and previous config saved to /var/cache/conftool/dbconfig/20250603-070155-root.json
- 07:00 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T395241)', diff saved to https://phabricator.wikimedia.org/P76894 and previous config saved to /var/cache/conftool/dbconfig/20250603-070036-fceratto.json
- 07:00 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 07:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T395241)', diff saved to https://phabricator.wikimedia.org/P76893 and previous config saved to /var/cache/conftool/dbconfig/20250603-070021-fceratto.json
- 06:58 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host bast7002.wikimedia.org with OS bookworm
- 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast7002.wikimedia.org - jmm@cumin1003"
- 06:54 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM bast7002.wikimedia.org - jmm@cumin1003"
- 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) bast7002.wikimedia.org on all recursors
- 06:54 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache bast7002.wikimedia.org on all recursors
- 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 06:54 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast7002.wikimedia.org - jmm@cumin1003"
- 06:53 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM bast7002.wikimedia.org - jmm@cumin1003"
- 06:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2009.codfw.wmnet with OS bullseye
- 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76892 and previous config saved to /var/cache/conftool/dbconfig/20250603-064649-root.json
- 06:45 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 06:45 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host bast7002.wikimedia.org
- 06:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P76891 and previous config saved to /var/cache/conftool/dbconfig/20250603-064513-fceratto.json
- 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76890 and previous config saved to /var/cache/conftool/dbconfig/20250603-064147-root.json
- 06:37 marostegui: Decrease buffer size on clouddb1016:s8 T390954
- 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76889 and previous config saved to /var/cache/conftool/dbconfig/20250603-063144-root.json
- 06:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Setting up x3 T390954
- 06:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P76887 and previous config saved to /var/cache/conftool/dbconfig/20250603-063004-fceratto.json
- 06:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Setting up x3 T390954
- 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76886 and previous config saved to /var/cache/conftool/dbconfig/20250603-062641-root.json
- 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76885 and previous config saved to /var/cache/conftool/dbconfig/20250603-061638-root.json
- 06:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T395241)', diff saved to https://phabricator.wikimedia.org/P76884 and previous config saved to /var/cache/conftool/dbconfig/20250603-061457-fceratto.json
- 06:11 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76883 and previous config saved to /var/cache/conftool/dbconfig/20250603-061134-root.json
- 06:07 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T395241)', diff saved to https://phabricator.wikimedia.org/P76882 and previous config saved to /var/cache/conftool/dbconfig/20250603-060719-fceratto.json
- 06:07 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76881 and previous config saved to /var/cache/conftool/dbconfig/20250603-060132-root.json
- 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76880 and previous config saved to /var/cache/conftool/dbconfig/20250603-055628-root.json
- 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76879 and previous config saved to /var/cache/conftool/dbconfig/20250603-054626-root.json
- 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'es2037 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76878 and previous config saved to /var/cache/conftool/dbconfig/20250603-054123-root.json
- 05:39 marostegui@deploy1003: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es6" (duration: 09m 52s)
- 05:32 marostegui@deploy1003: marostegui: Continuing with sync
- 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Give some weight to es1038', diff saved to https://phabricator.wikimedia.org/P76877 and previous config saved to /var/cache/conftool/dbconfig/20250603-053151-marostegui.json
- 05:31 marostegui@deploy1003: marostegui: Backport for Revert "db-production.php: Disable writes on es6" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 05:29 marostegui@deploy1003: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es6"
- 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1037 T395867', diff saved to https://phabricator.wikimedia.org/P76876 and previous config saved to /var/cache/conftool/dbconfig/20250603-052719-marostegui.json
- 05:27 marostegui@dns1006: END - running authdns-update
- 05:26 marostegui@dns1006: START - running authdns-update
- 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es1038 to es6 primary T395867', diff saved to https://phabricator.wikimedia.org/P76875 and previous config saved to /var/cache/conftool/dbconfig/20250603-052614-marostegui.json
- 05:25 marostegui: Starting es6 eqiad failover from es1037 to es1038 - T395867
- 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1038 with weight 0 T395867', diff saved to https://phabricator.wikimedia.org/P76874 and previous config saved to /var/cache/conftool/dbconfig/20250603-052353-marostegui.json
- 05:22 marostegui@deploy1003: Finished scap sync-world: Backport for db-production.php: Disable writes on es6 (T395867) (duration: 13m 39s)
- 05:15 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es6 T395867
- 05:14 marostegui@deploy1003: marostegui: Continuing with sync
- 05:13 marostegui@deploy1003: marostegui: Backport for db-production.php: Disable writes on es6 (T395867) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 05:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es6 T395867
- 05:09 marostegui@deploy1003: Started scap sync-world: Backport for db-production.php: Disable writes on es6 (T395867)
- 04:54 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es2037.codfw.wmnet with reason: Primary switchover es6 T395420
- 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2037 T395420', diff saved to https://phabricator.wikimedia.org/P76873 and previous config saved to /var/cache/conftool/dbconfig/20250603-045251-marostegui.json
- 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Promote es2035 to es6 primary and set section read-write T395420', diff saved to https://phabricator.wikimedia.org/P76872 and previous config saved to /var/cache/conftool/dbconfig/20250603-045202-root.json
- 04:51 marostegui: Starting es6 codfw failover from es2037 to es2035 - T395420
- 04:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover es6 T395420
- 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Set es2035 with weight 0 T395420', diff saved to https://phabricator.wikimedia.org/P76871 and previous config saved to /var/cache/conftool/dbconfig/20250603-044855-root.json
- 04:45 marostegui@cumin1002: dbctl commit (dc=all): 'Add es2048 to dbctl depooled T395771+', diff saved to https://phabricator.wikimedia.org/P76870 and previous config saved to /var/cache/conftool/dbconfig/20250603-044550-marostegui.json
- 04:40 marostegui@cumin1002: START - Cookbook sre.mysql.clone of es2040.codfw.wmnet onto es2048.codfw.wmnet
- 04:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2040 T395771', diff saved to https://phabricator.wikimedia.org/P76869 and previous config saved to /var/cache/conftool/dbconfig/20250603-043151-marostegui.json
- 04:29 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2040.codfw.wmnet with reason: Maintenance
- 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.1 (duration: 01m 39s)
- 03:48 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.4 refs T392174 (duration: 45m 55s)
- 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.4 refs T392174
- 00:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 00:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
2025-06-02
- 23:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
- 23:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2008.codfw.wmnet with reason: host reimage
- 23:30 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be2008.codfw.wmnet with OS bullseye
- 23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 23:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
- 23:24 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1050.eqiad.wmnet with OS bullseye
- 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:22 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1049.eqiad.wmnet with OS bullseye
- 23:10 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:09 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 22:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
- 22:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
- 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
- 22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
- 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
- 22:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
- 22:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
- 22:16 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 22:08 maryum: scap sync-world finished to deploy several security bugs and PrivateSettings.php changes
- 22:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bullseye
- 21:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bullseye
- 21:38 tgr@deploy1003: Unlocked for deployment [MediaWiki]: T395758 (duration: 22m 32s)
- 21:38 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 21:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 21:22 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cirrussearch205*,cirrussearch2060* for T395855 - bking@cumin2002
- 21:22 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cirrussearch205*,cirrussearch2060* for T395855 - bking@cumin2002
- 21:16 bking@cumin2002: conftool action : set/pooled=no; selector: name=cirrussearch2055.codfw.wmnet|cirrussearch2056.codfw.wmnet|cirrussearch2057.codfw.wmnet|cirrussearch2058.codfw.wmnet|cirrussearch2059.codfw.wmnet|cirrussearch2060.codfw.wmnet|cirrussearch2091.codfw.wmnet
- 21:16 tgr@deploy1003: Locking from deployment [MediaWiki]: T395758
- 21:13 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch.*.codfw.wmnet
- 21:11 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 21:06 cjming@deploy1003: Finished scap sync-world: Backport for Simple summaries survey for English (T389393) (duration: 11m 41s)
- 21:05 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1049.eqiad.wmnet with OS bullseye
- 21:04 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bullseye
- 20:59 cjming@deploy1003: cjming, ksarabia: Continuing with sync
- 20:59 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 20:59 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
- 20:56 cjming@deploy1003: cjming, ksarabia: Backport for Simple summaries survey for English (T389393) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 20:55 cjming@deploy1003: Started scap sync-world: Backport for Simple summaries survey for English (T389393)
- 20:51 jsn@deploy1003: Finished scap sync-world: Backport for Undeploy first set of Patroller Tools surveys (T389401) (duration: 12m 55s)
- 20:45 jsn@deploy1003: jsn: Continuing with sync
- 20:41 jsn@deploy1003: jsn: Backport for Undeploy first set of Patroller Tools surveys (T389401) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:38 jsn@deploy1003: Started scap sync-world: Backport for Undeploy first set of Patroller Tools surveys (T389401)
- 20:36 arlolra@deploy1003: Finished scap sync-world: Backport for Remove wgParserEnableLegacyHeadingDOM option (T371756) (duration: 10m 37s)
- 20:29 arlolra@deploy1003: arlolra: Continuing with sync
- 20:27 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1028
- 20:27 arlolra@deploy1003: arlolra: Backport for Remove wgParserEnableLegacyHeadingDOM option (T371756) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:27 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-in2001.wikimedia.org with reason: T395240
- 20:26 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-in1001.wikimedia.org with reason: T395240
- 20:26 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1028
- 20:25 arlolra@deploy1003: Started scap sync-world: Backport for Remove wgParserEnableLegacyHeadingDOM option (T371756)
- 20:23 cjming@deploy1003: Finished scap sync-world: Backport for ext.xLab: Send limited copies of stream configs (T391988) (duration: 15m 51s)
- 20:22 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-out1001.wikimedia.org with reason: T395240
- 20:18 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
- 20:16 cjming@deploy1003: cjming, phuedx: Continuing with sync
- 20:16 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on mx-out2001.wikimedia.org with reason: T395240
- 20:10 cjming@deploy1003: cjming, phuedx: Backport for ext.xLab: Send limited copies of stream configs (T391988) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:07 cjming@deploy1003: Started scap sync-world: Backport for ext.xLab: Send limited copies of stream configs (T391988)
- 19:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 19:40 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 19:36 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 19:32 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
- 19:20 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3008.esams.wmnet} and A:liberica
- 19:20 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs3008.esams.wmnet} and A:liberica
- 19:20 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs3008.esams.wmnet} and A:liberica
- 19:20 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs3008.esams.wmnet} and A:liberica
- 19:20 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs3008.esams.wmnet} and A:liberica
- 19:19 sukhe@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3008.esams.wmnet} and A:liberica
- 19:15 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3009.esams.wmnet} and A:liberica
- 19:15 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs3009.esams.wmnet} and A:liberica
- 19:14 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs3009.esams.wmnet} and A:liberica
- 19:14 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs3009.esams.wmnet} and A:liberica
- 19:14 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs3009.esams.wmnet} and A:liberica
- 19:14 sukhe@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3009.esams.wmnet} and A:liberica
- 19:08 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 19:06 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs3010.esams.wmnet} and A:liberica
- 19:06 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs3010.esams.wmnet} and A:liberica
- 19:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling P{lvs3010.esams.wmnet} and A:liberica
- 19:05 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs3010.esams.wmnet} and A:liberica
- 19:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling P{lvs3010.esams.wmnet} and A:liberica
- 19:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs3010.esams.wmnet} and A:liberica
- 18:46 jasmine@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wikikube-worker1028.eqiad.wmnet
- 18:46 jasmine@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:44 jasmine@cumin1002: START - Cookbook sre.dns.netbox
- 18:34 jasmine@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker1028.eqiad.wmnet
- 18:33 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1104.eqiad.wmnet,service=(cdn|ats-be)
- 18:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T395241)', diff saved to https://phabricator.wikimedia.org/P76866 and previous config saved to /var/cache/conftool/dbconfig/20250602-183230-fceratto.json
- 18:24 brett: include libvmod-wmfuniq 0.2.0~deb11u1 in bullseye-wikimedia
- 18:23 brett: include libvmod-wmfuniq 0.2.0~deb12u1 in bookworm-wikimedia
- 18:21 ebernhardson@deploy1003: Finished deploy [airflow-dags/search@443d0ab]: bump glent to 0.3.6 (duration: 00m 29s)
- 18:21 ebernhardson@deploy1003: Started deploy [airflow-dags/search@443d0ab]: bump glent to 0.3.6
- 18:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P76865 and previous config saved to /var/cache/conftool/dbconfig/20250602-181722-fceratto.json
- 18:10 jasmine@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts wikikube-worker[1026-1028].eqiad.wmnet
- 18:10 jasmine@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:10 jasmine@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[1026-1028].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1002"
- 18:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 18:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 18:05 jasmine@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[1026-1028].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1002"
- 18:02 jasmine@cumin1002: START - Cookbook sre.dns.netbox
- 18:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P76864 and previous config saved to /var/cache/conftool/dbconfig/20250602-180216-fceratto.json
- 17:50 jasmine@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[1026-1028].eqiad.wmnet
- 17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T395241)', diff saved to https://phabricator.wikimedia.org/P76863 and previous config saved to /var/cache/conftool/dbconfig/20250602-174708-fceratto.json
- 17:38 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T395241)', diff saved to https://phabricator.wikimedia.org/P76862 and previous config saved to /var/cache/conftool/dbconfig/20250602-173850-fceratto.json
- 17:38 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1254.eqiad.wmnet with reason: Maintenance
- 17:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 17:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T395241)', diff saved to https://phabricator.wikimedia.org/P76861 and previous config saved to /var/cache/conftool/dbconfig/20250602-173316-fceratto.json
- 17:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
- 17:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250602-171804-fceratto.json
- 17:05 jasmine@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1026-1028].eqiad.wmnet
- 17:04 jasmine@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1026-1028].eqiad.wmnet
- 17:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P76860 and previous config saved to /var/cache/conftool/dbconfig/20250602-170256-fceratto.json
- 16:50 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 16:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T395241)', diff saved to https://phabricator.wikimedia.org/P76859 and previous config saved to /var/cache/conftool/dbconfig/20250602-164748-fceratto.json
- 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcontrol2010-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T395241)', diff saved to https://phabricator.wikimedia.org/P76857 and previous config saved to /var/cache/conftool/dbconfig/20250602-164030-fceratto.json
- 16:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 16:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 16:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T395241)', diff saved to https://phabricator.wikimedia.org/P76856 and previous config saved to /var/cache/conftool/dbconfig/20250602-164003-fceratto.json
- 16:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76855 and previous config saved to /var/cache/conftool/dbconfig/20250602-162957-root.json
- 16:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P76854 and previous config saved to /var/cache/conftool/dbconfig/20250602-162455-fceratto.json
- 16:22 sukhe: sudo cumin -b1 -s60 'A:cp and not P{cp7001*}' "depool cdn && sleep 10 && run-puppet-agent --enable 'merging CR 1091330' && systemctl restart trafficserver.service && sleep 10 && pool cdn"
- 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76853 and previous config saved to /var/cache/conftool/dbconfig/20250602-161452-root.json
- 16:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P76852 and previous config saved to /var/cache/conftool/dbconfig/20250602-160948-fceratto.json
- 16:03 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet [reason: [end] testing CR 1091330]
- 15:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76851 and previous config saved to /var/cache/conftool/dbconfig/20250602-155946-root.json
- 15:55 sukhe: enable puppet and run agent on cp7001
- 15:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T395241)', diff saved to https://phabricator.wikimedia.org/P76850 and previous config saved to /var/cache/conftool/dbconfig/20250602-155441-fceratto.json
- 15:52 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet [reason: testing CR 1091330]
- 15:50 sukhe: disable puppet on A:cp to merge CR: 1091330
- 15:49 phuedx@deploy1003: Finished scap sync-world: Backport for Enable MetricsPlatform's experimentation feature (duration: 14m 23s)
- 15:47 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T395241)', diff saved to https://phabricator.wikimedia.org/P76849 and previous config saved to /var/cache/conftool/dbconfig/20250602-154734-fceratto.json
- 15:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 15:47 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T395241)', diff saved to https://phabricator.wikimedia.org/P76848 and previous config saved to /var/cache/conftool/dbconfig/20250602-154709-fceratto.json
- 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
- 15:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76847 and previous config saved to /var/cache/conftool/dbconfig/20250602-154440-root.json
- 15:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2009.codfw.wmnet with reason: host reimage
- 15:42 phuedx@deploy1003: phuedx: Continuing with sync
- 15:38 phuedx@deploy1003: phuedx: Backport for Enable MetricsPlatform's experimentation feature synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:35 phuedx@deploy1003: Started scap sync-world: Backport for Enable MetricsPlatform's experimentation feature
- 15:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P76846 and previous config saved to /var/cache/conftool/dbconfig/20250602-153201-fceratto.json
- 15:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76845 and previous config saved to /var/cache/conftool/dbconfig/20250602-152935-root.json
- 15:27 joal@deploy1003: Finished deploy [airflow-dags/analytics@03db055]: Regular analytics weekly train (with pull...) [airflow-dags/analytics_test@03db0552] (duration: 00m 42s)
- 15:26 joal@deploy1003: Started deploy [airflow-dags/analytics@03db055]: Regular analytics weekly train (with pull...) [airflow-dags/analytics_test@03db0552]
- 15:21 thcipriani: jouncebot nowandnext
- 15:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2010-dev.codfw.wmnet with OS bookworm
- 15:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P76844 and previous config saved to /var/cache/conftool/dbconfig/20250602-151654-fceratto.json
- 15:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2009.codfw.wmnet with OS bullseye
- 15:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76843 and previous config saved to /var/cache/conftool/dbconfig/20250602-151429-root.json
- 15:03 joal@deploy1003: Finished deploy [airflow-dags/analytics_test@4ebb376]: Regular analytics weekly train [airflow-dags/analytics_test@4ebb376f] (duration: 00m 05s)
- 15:03 joal@deploy1003: Started deploy [airflow-dags/analytics_test@4ebb376]: Regular analytics weekly train [airflow-dags/analytics_test@4ebb376f]
- 15:03 joal@deploy1003: Finished deploy [airflow-dags/analytics@afad011]: Regular analytics weekly train [airflow-dags/main@afad011c] (duration: 00m 07s)
- 15:03 joal@deploy1003: Started deploy [airflow-dags/analytics@afad011]: Regular analytics weekly train [airflow-dags/main@afad011c]
- 15:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T395241)', diff saved to https://phabricator.wikimedia.org/P76842 and previous config saved to /var/cache/conftool/dbconfig/20250602-150146-fceratto.json
- 15:00 phuedx: Disabled the SDS 2.4.11 Synthetic A/A Test in xLab
- 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T395241)', diff saved to https://phabricator.wikimedia.org/P76841 and previous config saved to /var/cache/conftool/dbconfig/20250602-145443-fceratto.json
- 14:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T395241)', diff saved to https://phabricator.wikimedia.org/P76840 and previous config saved to /var/cache/conftool/dbconfig/20250602-145418-fceratto.json
- 14:54 joal@deploy1003: Finished deploy [analytics/refinery@b1aa837] (hadoop-test): Regular analytics weekly train test [analytics/refinery@b1aa837f] (duration: 09m 27s)
- 14:44 joal@deploy1003: Started deploy [analytics/refinery@b1aa837] (hadoop-test): Regular analytics weekly train test [analytics/refinery@b1aa837f]
- 14:44 joal@deploy1003: Finished deploy [analytics/refinery@b1aa837] (thin): Regular analytics weekly train THIN [analytics/refinery@b1aa837f] (duration: 01m 06s)
- 14:43 joal@deploy1003: Started deploy [analytics/refinery@b1aa837] (thin): Regular analytics weekly train THIN [analytics/refinery@b1aa837f]
- 14:42 joal@deploy1003: Finished deploy [analytics/refinery@b1aa837]: Regular analytics weekly train [analytics/refinery@b1aa837f] (duration: 03m 08s)
- 14:39 joal@deploy1003: Started deploy [analytics/refinery@b1aa837]: Regular analytics weekly train [analytics/refinery@b1aa837f]
- 14:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P76835 and previous config saved to /var/cache/conftool/dbconfig/20250602-143910-fceratto.json
- 14:36 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[1154,1211].eqiad.wmnet with reason: Maintenance
- 14:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic1067.eqiad.wmnet
- 14:35 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:35 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: elastic1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 14:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: elastic1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 14:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P76833 and previous config saved to /var/cache/conftool/dbconfig/20250602-142403-fceratto.json
- 14:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T395241)', diff saved to https://phabricator.wikimedia.org/P76832 and previous config saved to /var/cache/conftool/dbconfig/20250602-140854-fceratto.json
- 14:04 phuedx: Enabling the SDS 2.4.11 Synthetic A/A Test in xLab
- 13:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T395241)', diff saved to https://phabricator.wikimedia.org/P76831 and previous config saved to /var/cache/conftool/dbconfig/20250602-135945-fceratto.json
- 13:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 13:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T395241)', diff saved to https://phabricator.wikimedia.org/P76830 and previous config saved to /var/cache/conftool/dbconfig/20250602-135920-fceratto.json
- 13:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:49 jmm@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host prometheus7002.magru.wmnet
- 13:49 jmm@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host prometheus7002.magru.wmnet with OS bookworm
- 13:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P76829 and previous config saved to /var/cache/conftool/dbconfig/20250602-134413-fceratto.json
- 13:43 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:38 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic1067.eqiad.wmnet
- 13:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cirrussearch[1064-1066].eqiad.wmnet
- 13:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[1064-1066].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 13:37 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cirrussearch[1064-1066].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin2002"
- 13:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2009.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P76828 and previous config saved to /var/cache/conftool/dbconfig/20250602-132906-fceratto.json
- 13:24 Lucas_WMDE: UTC afternoon backport+config window done
- 13:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 13:22 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for core-Namespaces: Add Page, Author to default search ns in ruwikisource (T395632) (duration: 12m 00s)
- 13:21 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1051.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1048.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1050.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1049.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:19 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 13:17 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for cloudcephosd1048-51 - jclark@cumin1002"
- 13:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for cloudcephosd1048-51 - jclark@cumin1002"
- 13:15 lucaswerkmeister-wmde@deploy1003: bunnypranav, lucaswerkmeister-wmde: Continuing with sync
- 13:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T395241)', diff saved to https://phabricator.wikimedia.org/P76826 and previous config saved to /var/cache/conftool/dbconfig/20250602-131359-fceratto.json
- 13:13 lucaswerkmeister-wmde@deploy1003: bunnypranav, lucaswerkmeister-wmde: Backport for core-Namespaces: Add Page, Author to default search ns in ruwikisource (T395632) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:13 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 13:10 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for core-Namespaces: Add Page, Author to default search ns in ruwikisource (T395632)
- 13:09 bking@cumin2002: START - Cookbook sre.hosts.decommission for hosts cirrussearch[1064-1066].eqiad.wmnet
- 13:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T395241)', diff saved to https://phabricator.wikimedia.org/P76825 and previous config saved to /var/cache/conftool/dbconfig/20250602-130548-fceratto.json
- 13:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 13:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T395241)', diff saved to https://phabricator.wikimedia.org/P76824 and previous config saved to /var/cache/conftool/dbconfig/20250602-130523-fceratto.json
- 12:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P76823 and previous config saved to /var/cache/conftool/dbconfig/20250602-125016-fceratto.json
- 12:37 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of es2036.codfw.wmnet onto es2047.codfw.wmnet
- 12:37 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2047 gradually with 4 steps - Pool es2047.codfw.wmnet in after cloning
- 12:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
- 12:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P76821 and previous config saved to /var/cache/conftool/dbconfig/20250602-123508-fceratto.json
- 12:33 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
- 12:26 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host prometheus7002.magru.wmnet with OS bookworm
- 12:25 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus7002.magru.wmnet - jmm@cumin1003"
- 12:25 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM prometheus7002.magru.wmnet - jmm@cumin1003"
- 12:24 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) prometheus7002.magru.wmnet on all recursors
- 12:24 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache prometheus7002.magru.wmnet on all recursors
- 12:24 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:24 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus7002.magru.wmnet - jmm@cumin1003"
- 12:22 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM prometheus7002.magru.wmnet - jmm@cumin1003"
- 12:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T395241)', diff saved to https://phabricator.wikimedia.org/P76819 and previous config saved to /var/cache/conftool/dbconfig/20250602-122001-fceratto.json
- 12:17 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 12:17 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host prometheus7002.magru.wmnet
- 12:12 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host doh7003.wikimedia.org
- 12:12 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host doh7003.wikimedia.org with OS bookworm
- 12:11 claime: cgoubert@mwmaint1002:~$ sudo systemctl restart mediawiki_job_generatecaptcha.service - T388531
- 12:11 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host apt-staging2001.codfw.wmnet
- 12:10 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.check-dbs (exit_code=0) Checking container DBs of global-data-captcha-render
- 12:10 mvernon@cumin1002: START - Cookbook sre.swift.check-dbs Checking container DBs of global-data-captcha-render
- 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T395241)', diff saved to https://phabricator.wikimedia.org/P76818 and previous config saved to /var/cache/conftool/dbconfig/20250602-121041-fceratto.json
- 12:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76817 and previous config saved to /var/cache/conftool/dbconfig/20250602-121016-fceratto.json
- 12:07 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host apt-staging2001.codfw.wmnet
- 12:00 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2002.codfw.wmnet
- 11:59 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 11:57 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 11:55 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 11:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P76814 and previous config saved to /var/cache/conftool/dbconfig/20250602-115509-fceratto.json
- 11:54 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host build2002.codfw.wmnet
- 11:51 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh7003.wikimedia.org with reason: host reimage
- 11:49 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on doh7003.wikimedia.org with reason: host reimage
- 11:48 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2047 gradually with 4 steps - Pool es2047.codfw.wmnet in after cloning
- 11:47 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 11:46 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw1001.wikimedia.org
- 11:42 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-rw1001.wikimedia.org
- 11:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P76812 and previous config saved to /var/cache/conftool/dbconfig/20250602-114001-fceratto.json
- 11:39 claime: cgoubert@mwmaint1002:~$ sudo systemctl restart mediawiki_job_generatecaptcha.service - T388531
- 11:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-rw2001.wikimedia.org
- 11:33 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-rw2001.wikimedia.org
- 11:32 claime: Manual run of cronjobs/generatecaptcha on k8s - T388531
- 11:31 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 11:30 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 11:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76811 and previous config saved to /var/cache/conftool/dbconfig/20250602-112453-fceratto.json
- 11:23 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint1001.eqiad.wmnet
- 11:19 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-maint1001.eqiad.wmnet
- 11:18 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host doh7003.wikimedia.org with OS bookworm
- 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh7003.wikimedia.org - jmm@cumin1003"
- 11:17 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM doh7003.wikimedia.org - jmm@cumin1003"
- 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) doh7003.wikimedia.org on all recursors
- 11:17 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache doh7003.wikimedia.org on all recursors
- 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:17 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh7003.wikimedia.org - jmm@cumin1003"
- 11:16 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM doh7003.wikimedia.org - jmm@cumin1003"
- 11:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T395241)', diff saved to https://phabricator.wikimedia.org/P76810 and previous config saved to /var/cache/conftool/dbconfig/20250602-111519-fceratto.json
- 11:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 11:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 11:11 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 11:11 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host doh7003.wikimedia.org
- 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76809 and previous config saved to /var/cache/conftool/dbconfig/20250602-111044-root.json
- 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host durum7003.magru.wmnet
- 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum7003.magru.wmnet with OS bookworm
- 10:58 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ldap-maint2001.codfw.wmnet
- 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76808 and previous config saved to /var/cache/conftool/dbconfig/20250602-105539-root.json
- 10:54 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ldap-maint2001.codfw.wmnet
- 10:48 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2036 gradually with 4 steps - Pool es2036.codfw.wmnet in after cloning
- 10:41 kamila@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/jaeger: apply
- 10:40 kamila@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/jaeger: apply
- 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76806 and previous config saved to /var/cache/conftool/dbconfig/20250602-104032-root.json
- 10:40 kamila@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
- 10:40 kamila@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
- 10:37 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
- 10:34 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
- 10:31 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
- 10:27 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
- 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76804 and previous config saved to /var/cache/conftool/dbconfig/20250602-102526-root.json
- 10:22 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor2003.codfw.wmnet
- 10:18 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
- 10:12 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
- 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76802 and previous config saved to /var/cache/conftool/dbconfig/20250602-101020-root.json
- 10:08 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
- 10:07 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host durum7003.magru.wmnet with OS bookworm
- 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum7003.magru.wmnet - jmm@cumin1003"
- 10:06 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM durum7003.magru.wmnet - jmm@cumin1003"
- 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) durum7003.magru.wmnet on all recursors
- 10:06 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache durum7003.magru.wmnet on all recursors
- 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:06 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum7003.magru.wmnet - jmm@cumin1003"
- 10:06 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM durum7003.magru.wmnet - jmm@cumin1003"
- 10:02 marostegui@cumin1002: START - Cookbook sre.mysql.pool es2036 gradually with 4 steps - Pool es2036.codfw.wmnet in after cloning
- 10:02 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 10:02 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host durum7003.magru.wmnet
- 09:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2003.codfw.wmnet
- 09:59 jmm@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ncredir7003.magru.wmnet
- 09:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncredir7003.magru.wmnet with OS bookworm
- 09:55 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host rpki2003.codfw.wmnet
- 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76800 and previous config saved to /var/cache/conftool/dbconfig/20250602-095514-root.json
- 09:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2039.codfw.wmnet with reason: Maintenance
- 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2039 T395647', diff saved to https://phabricator.wikimedia.org/P76798 and previous config saved to /var/cache/conftool/dbconfig/20250602-094402-marostegui.json
- 09:40 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
- 09:33 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
- 09:28 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir7003.magru.wmnet with reason: host reimage
- 09:27 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
- 09:24 jmm@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir7003.magru.wmnet with reason: host reimage
- 09:22 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
- 09:13 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard1003.eqiad.wmnet
- 09:10 jelto: update gitlab-settings artifact retention to 6 month - T395014
- 09:09 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetboard1003.eqiad.wmnet
- 09:02 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host puppetboard2003.codfw.wmnet
- 08:58 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7003.magru.wmnet with OS bookworm
- 08:58 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host puppetboard2003.codfw.wmnet
- 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 08:51 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ncredir7003.magru.wmnet on all recursors
- 08:51 jmm@cumin1003: START - Cookbook sre.dns.wipe-cache ncredir7003.magru.wmnet on all recursors
- 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:51 jmm@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 08:51 jmm@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ncredir7003.magru.wmnet - jmm@cumin1003"
- 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76796 and previous config saved to /var/cache/conftool/dbconfig/20250602-085105-root.json
- 08:47 jmm@cumin1003: START - Cookbook sre.dns.netbox
- 08:47 jmm@cumin1003: START - Cookbook sre.ganeti.makevm for new host ncredir7003.magru.wmnet
- 08:45 jmm@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ncredir7003.magru.wmnet with OS bookworm
- 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ncredir7003.magru.wmnet
- 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:41 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ncredir7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 08:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76795 and previous config saved to /var/cache/conftool/dbconfig/20250602-083559-root.json
- 08:33 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ncredir7003.magru.wmnet
- 08:20 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76794 and previous config saved to /var/cache/conftool/dbconfig/20250602-082053-root.json
- 08:11 jmm@cumin1003: START - Cookbook sre.hosts.reimage for host ncredir7003.magru.wmnet with OS bookworm
- 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76793 and previous config saved to /var/cache/conftool/dbconfig/20250602-080547-root.json
- 07:58 phuedx@deploy1003: Finished scap sync-world: Backport for Beta Cluster: Support A/B experiments (T393918) (duration: 35m 59s)
- 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76792 and previous config saved to /var/cache/conftool/dbconfig/20250602-075041-root.json
- 07:49 phuedx@deploy1003: phuedx, dr0ptp4kt: Continuing with sync
- 07:38 phuedx@deploy1003: phuedx, dr0ptp4kt: Backport for Beta Cluster: Support A/B experiments (T393918) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 07:36 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
- 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76791 and previous config saved to /var/cache/conftool/dbconfig/20250602-073535-root.json
- 07:27 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
- 07:22 phuedx@deploy1003: Started scap sync-world: Backport for Beta Cluster: Support A/B experiments (T393918)
- 07:20 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast5004.wikimedia.org
- 07:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: Maintenance
- 07:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: Maintenance
- 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P76790 and previous config saved to /var/cache/conftool/dbconfig/20250602-070837-root.json
- 07:08 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast4005.wikimedia.org
- 07:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1040 T395647', diff saved to https://phabricator.wikimedia.org/P76789 and previous config saved to /var/cache/conftool/dbconfig/20250602-070602-marostegui.json
- 07:02 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host bast4005.wikimedia.org
- 06:59 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
- 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P76787 and previous config saved to /var/cache/conftool/dbconfig/20250602-065331-root.json
- 06:53 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
- 06:52 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
- 06:48 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
- 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 60%: Repooling', diff saved to https://phabricator.wikimedia.org/P76786 and previous config saved to /var/cache/conftool/dbconfig/20250602-063826-root.json
- 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 40%: Repooling', diff saved to https://phabricator.wikimedia.org/P76785 and previous config saved to /var/cache/conftool/dbconfig/20250602-062320-root.json
- 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 20%: Repooling', diff saved to https://phabricator.wikimedia.org/P76783 and previous config saved to /var/cache/conftool/dbconfig/20250602-060815-root.json
- 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P76782 and previous config saved to /var/cache/conftool/dbconfig/20250602-055309-root.json
- 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169 T395663', diff saved to https://phabricator.wikimedia.org/P76781 and previous config saved to /var/cache/conftool/dbconfig/20250602-053905-marostegui.json
- 05:38 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Add es2047 to dbctl depooled T395771', diff saved to https://phabricator.wikimedia.org/P76780 and previous config saved to /var/cache/conftool/dbconfig/20250602-051957-marostegui.json
- 05:15 marostegui@cumin1002: START - Cookbook sre.mysql.clone of es2036.codfw.wmnet onto es2047.codfw.wmnet
- 05:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2036.codfw.wmnet with reason: Maintenance
- 05:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es2036 T395771', diff saved to https://phabricator.wikimedia.org/P76779 and previous config saved to /var/cache/conftool/dbconfig/20250602-050150-marostegui.json