Server Admin Log/Archive 87
Appearance
2024-11-30
- 11:59 joal@deploy2002: Finished deploy [airflow-dags/analytics@fe37cfe]: Hotfix airflow analytics deploy [airflow-dags/analytics@fe37cfec] (duration: 01m 21s)
- 11:58 joal@deploy2002: Started deploy [airflow-dags/analytics@fe37cfe]: Hotfix airflow analytics deploy [airflow-dags/analytics@fe37cfec]
2024-11-29
- 16:55 jayme: puppet ca destroy mwmaint.discovery.wmnet - T341859
- 16:22 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet
- 16:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1006.eqiad.wmnet
- 16:15 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet
- 16:15 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1006.eqiad.wmnet
- 15:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet
- 15:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet
- 15:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet
- 15:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet
- 15:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
- 15:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
- 15:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P71448 and previous config saved to /var/cache/conftool/dbconfig/20241129-151101-ladsgroup.json
- 15:10 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
- 15:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
- 14:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P71447 and previous config saved to /var/cache/conftool/dbconfig/20241129-145554-ladsgroup.json
- 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P71446 and previous config saved to /var/cache/conftool/dbconfig/20241129-144047-ladsgroup.json
- 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P71445 and previous config saved to /var/cache/conftool/dbconfig/20241129-142540-ladsgroup.json
- 14:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P71444 and previous config saved to /var/cache/conftool/dbconfig/20241129-141931-ladsgroup.json
- 14:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 14:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 14:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 14:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 14:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P71443 and previous config saved to /var/cache/conftool/dbconfig/20241129-141409-ladsgroup.json
- 13:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P71442 and previous config saved to /var/cache/conftool/dbconfig/20241129-135902-ladsgroup.json
- 13:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P71441 and previous config saved to /var/cache/conftool/dbconfig/20241129-134355-ladsgroup.json
- 13:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P71440 and previous config saved to /var/cache/conftool/dbconfig/20241129-132848-ladsgroup.json
- 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1021.eqiad.wmnet
- 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P71439 and previous config saved to /var/cache/conftool/dbconfig/20241129-132136-ladsgroup.json
- 13:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 13:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P71438 and previous config saved to /var/cache/conftool/dbconfig/20241129-132111-ladsgroup.json
- 13:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:13 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P71437 and previous config saved to /var/cache/conftool/dbconfig/20241129-130604-ladsgroup.json
- 13:06 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1021.eqiad.wmnet
- 12:57 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P71436 and previous config saved to /var/cache/conftool/dbconfig/20241129-125057-ladsgroup.json
- 12:42 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P71434 and previous config saved to /var/cache/conftool/dbconfig/20241129-123549-ladsgroup.json
- 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1015.eqiad.wmnet
- 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P71433 and previous config saved to /var/cache/conftool/dbconfig/20241129-122735-ladsgroup.json
- 12:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 12:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 12:27 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1015.eqiad.wmnet
- 12:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P71432 and previous config saved to /var/cache/conftool/dbconfig/20241129-121010-ladsgroup.json
- 12:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 12:04 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye
- 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P71431 and previous config saved to /var/cache/conftool/dbconfig/20241129-115501-ladsgroup.json
- 11:44 moritzm: imported mapnik_4.0.3+ds2~wmf12u1 to component/maps T216826
- 11:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
- 11:40 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
- 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P71430 and previous config saved to /var/cache/conftool/dbconfig/20241129-113954-ladsgroup.json
- 11:31 Dreamy_Jazz: Started MediaModeration scanning scripts to scan all wikis
- 11:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye
- 11:27 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2084.codfw.wmnet with OS bullseye
- 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P71429 and previous config saved to /var/cache/conftool/dbconfig/20241129-112447-ladsgroup.json
- 11:19 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:18 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P71428 and previous config saved to /var/cache/conftool/dbconfig/20241129-111554-ladsgroup.json
- 11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 11:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage
- 11:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 11:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 10:57 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage
- 10:45 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye
- 10:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
- 10:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
- 10:10 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
- 09:57 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 09:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
- 09:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 09:18 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 09:05 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 09:02 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 08:54 moritzm: imported mapbox-polylabel 2.0.1-1~wmf12u1 to component/maps T216826
- 08:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:16 moritzm: imported mapbox-geometry_2.0.3-1~wmf12u1 to component/maps T216826
- 07:19 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71427 and previous config saved to /var/cache/conftool/dbconfig/20241129-071905-root.json
- 07:10 aqu@deploy2002: Finished deploy [airflow-dags/analytics@656d6df]: Generate canary events faster in Airflow (duration: 03m 15s)
- 07:06 aqu@deploy2002: Started deploy [airflow-dags/analytics@656d6df]: Generate canary events faster in Airflow
- 07:03 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71426 and previous config saved to /var/cache/conftool/dbconfig/20241129-070333-root.json
- 06:48 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71425 and previous config saved to /var/cache/conftool/dbconfig/20241129-064801-root.json
- 06:28 marostegui@cumin2002: dbctl commit (dc=all): 'Repool', diff saved to https://phabricator.wikimedia.org/P71424 and previous config saved to /var/cache/conftool/dbconfig/20241129-062833-marostegui.json
- 06:27 marostegui@cumin2002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1223 quickly with 2 steps - Fixed corruption
- 06:26 marostegui@cumin2002: START - Cookbook sre.mysql.pool db1223 quickly with 2 steps - Fixed corruption
- 05:52 taavi@cumin1002: dbctl commit (dc=all): 'depool db1223, replication broken', diff saved to https://phabricator.wikimedia.org/P71423 and previous config saved to /var/cache/conftool/dbconfig/20241129-055245-taavi.json
- 04:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T328817)', diff saved to https://phabricator.wikimedia.org/P71422 and previous config saved to /var/cache/conftool/dbconfig/20241129-045409-ladsgroup.json
- 04:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P71421 and previous config saved to /var/cache/conftool/dbconfig/20241129-043902-ladsgroup.json
- 04:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P71420 and previous config saved to /var/cache/conftool/dbconfig/20241129-042355-ladsgroup.json
- {{safesubst:SAL entry|1=04:20 tstarling@deploy2002: Finished scap sync-world: Backport for addWiki.php tweaks, Run dumpInterwiki.php locally with no changes, Prepare id.wikivoyage.org for installation (T380726 T352113), dumpInterwiki: read from preinstall.dblist (T352113), addWiki: Move DB_ADMIN to core, [[gerrit:1099064|addWiki: Add UpdateSearchIndexCon}}
- 04:12 tstarling@deploy2002: tstarling: Continuing with sync
- {{safesubst:SAL entry|1=04:12 tstarling@deploy2002: tstarling: Backport for addWiki.php tweaks, Run dumpInterwiki.php locally with no changes, Prepare id.wikivoyage.org for installation (T380726 T352113), dumpInterwiki: read from preinstall.dblist (T352113), addWiki: Move DB_ADMIN to core, addWiki: Add UpdateSearchIndexConfig, [[gerrit}}
- 04:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T328817)', diff saved to https://phabricator.wikimedia.org/P71419 and previous config saved to /var/cache/conftool/dbconfig/20241129-040846-ladsgroup.json
- 04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2229 (T328817)', diff saved to https://phabricator.wikimedia.org/P71418 and previous config saved to /var/cache/conftool/dbconfig/20241129-040547-ladsgroup.json
- 04:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance
- 04:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance
- 04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T328817)', diff saved to https://phabricator.wikimedia.org/P71417 and previous config saved to /var/cache/conftool/dbconfig/20241129-040523-ladsgroup.json
- {{safesubst:SAL entry|1=04:01 tstarling@deploy2002: Started scap sync-world: Backport for addWiki.php tweaks, Run dumpInterwiki.php locally with no changes, Prepare id.wikivoyage.org for installation (T380726 T352113), dumpInterwiki: read from preinstall.dblist (T352113), addWiki: Move DB_ADMIN to core, [[gerrit:1099064|addWiki: Add UpdateSearchIndexConf}}
- 03:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P71416 and previous config saved to /var/cache/conftool/dbconfig/20241129-035016-ladsgroup.json
- 03:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P71415 and previous config saved to /var/cache/conftool/dbconfig/20241129-033509-ladsgroup.json
- 03:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T328817)', diff saved to https://phabricator.wikimedia.org/P71414 and previous config saved to /var/cache/conftool/dbconfig/20241129-032002-ladsgroup.json
- 03:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T328817)', diff saved to https://phabricator.wikimedia.org/P71413 and previous config saved to /var/cache/conftool/dbconfig/20241129-031705-ladsgroup.json
- 03:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 03:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T328817)', diff saved to https://phabricator.wikimedia.org/P71412 and previous config saved to /var/cache/conftool/dbconfig/20241129-031642-ladsgroup.json
- 03:04 tstarling@deploy2002: scap failed: <KeyError> '1 dbs from /srv/mediawiki-staging/wikiversions.json are missing from /srv/mediawiki-staging/dblists/all.dblist: idwikivoyage' (scap version: 4.129.0) (duration: 00m 00s)
- 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P71411 and previous config saved to /var/cache/conftool/dbconfig/20241129-030133-ladsgroup.json
- 02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P71410 and previous config saved to /var/cache/conftool/dbconfig/20241129-024625-ladsgroup.json
- 02:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T328817)', diff saved to https://phabricator.wikimedia.org/P71409 and previous config saved to /var/cache/conftool/dbconfig/20241129-023118-ladsgroup.json
- 02:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T328817)', diff saved to https://phabricator.wikimedia.org/P71408 and previous config saved to /var/cache/conftool/dbconfig/20241129-022822-ladsgroup.json
- 02:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 02:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 02:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 02:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 02:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T328817)', diff saved to https://phabricator.wikimedia.org/P71407 and previous config saved to /var/cache/conftool/dbconfig/20241129-022645-ladsgroup.json
- 02:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P71406 and previous config saved to /var/cache/conftool/dbconfig/20241129-021138-ladsgroup.json
- 01:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P71405 and previous config saved to /var/cache/conftool/dbconfig/20241129-015631-ladsgroup.json
- 01:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T328817)', diff saved to https://phabricator.wikimedia.org/P71404 and previous config saved to /var/cache/conftool/dbconfig/20241129-014124-ladsgroup.json
- 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T328817)', diff saved to https://phabricator.wikimedia.org/P71403 and previous config saved to /var/cache/conftool/dbconfig/20241129-013912-ladsgroup.json
- 01:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 01:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 01:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T328817)', diff saved to https://phabricator.wikimedia.org/P71402 and previous config saved to /var/cache/conftool/dbconfig/20241129-013850-ladsgroup.json
- 01:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P71401 and previous config saved to /var/cache/conftool/dbconfig/20241129-012343-ladsgroup.json
- 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P71400 and previous config saved to /var/cache/conftool/dbconfig/20241129-010835-ladsgroup.json
- 00:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T328817)', diff saved to https://phabricator.wikimedia.org/P71399 and previous config saved to /var/cache/conftool/dbconfig/20241129-005328-ladsgroup.json
- 00:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T328817)', diff saved to https://phabricator.wikimedia.org/P71398 and previous config saved to /var/cache/conftool/dbconfig/20241129-005117-ladsgroup.json
- 00:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 00:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T328817)', diff saved to https://phabricator.wikimedia.org/P71397 and previous config saved to /var/cache/conftool/dbconfig/20241129-005054-ladsgroup.json
- 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P71396 and previous config saved to /var/cache/conftool/dbconfig/20241129-003547-ladsgroup.json
- 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P71395 and previous config saved to /var/cache/conftool/dbconfig/20241129-002040-ladsgroup.json
- 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T328817)', diff saved to https://phabricator.wikimedia.org/P71394 and previous config saved to /var/cache/conftool/dbconfig/20241129-000533-ladsgroup.json
- 00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T328817)', diff saved to https://phabricator.wikimedia.org/P71393 and previous config saved to /var/cache/conftool/dbconfig/20241129-000234-ladsgroup.json
- 00:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 00:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T328817)', diff saved to https://phabricator.wikimedia.org/P71392 and previous config saved to /var/cache/conftool/dbconfig/20241129-000211-ladsgroup.json
2024-11-28
- 23:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P71391 and previous config saved to /var/cache/conftool/dbconfig/20241128-234704-ladsgroup.json
- 23:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P71390 and previous config saved to /var/cache/conftool/dbconfig/20241128-233426-ladsgroup.json
- 23:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P71389 and previous config saved to /var/cache/conftool/dbconfig/20241128-233157-ladsgroup.json
- 23:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P71388 and previous config saved to /var/cache/conftool/dbconfig/20241128-231919-ladsgroup.json
- 23:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T328817)', diff saved to https://phabricator.wikimedia.org/P71387 and previous config saved to /var/cache/conftool/dbconfig/20241128-231650-ladsgroup.json
- 23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T328817)', diff saved to https://phabricator.wikimedia.org/P71386 and previous config saved to /var/cache/conftool/dbconfig/20241128-231350-ladsgroup.json
- 23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T328817)', diff saved to https://phabricator.wikimedia.org/P71385 and previous config saved to /var/cache/conftool/dbconfig/20241128-231312-ladsgroup.json
- 23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P71384 and previous config saved to /var/cache/conftool/dbconfig/20241128-230412-ladsgroup.json
- 22:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P71383 and previous config saved to /var/cache/conftool/dbconfig/20241128-225805-ladsgroup.json
- 22:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1178 gradually with 4 steps - Maint over (T361627)
- 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P71381 and previous config saved to /var/cache/conftool/dbconfig/20241128-224905-ladsgroup.json
- 22:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P71380 and previous config saved to /var/cache/conftool/dbconfig/20241128-224258-ladsgroup.json
- 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P71379 and previous config saved to /var/cache/conftool/dbconfig/20241128-223959-ladsgroup.json
- 22:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 22:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 22:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 22:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 22:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T328817)', diff saved to https://phabricator.wikimedia.org/P71377 and previous config saved to /var/cache/conftool/dbconfig/20241128-222751-ladsgroup.json
- 22:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T328817)', diff saved to https://phabricator.wikimedia.org/P71376 and previous config saved to /var/cache/conftool/dbconfig/20241128-222250-ladsgroup.json
- 22:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 22:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
- away: UTC late deploys done
- {{safesubst:SAL entry|1=22:17 tgr@deploy2002: Finished scap sync-world: Backport for Localisation updates (November 26) (T372175), extend account creation lookup service to cover forced creations by others (T378401), extend account creation backfill script to forced account creations by others (T378401), [[gerrit:1098929|ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot depl}}
- 22:07 tgr@deploy2002: tgr, ariel, matmarex, mszabo: Continuing with sync
- 22:05 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1178 gradually with 4 steps - Maint over (T361627)
- {{safesubst:SAL entry|1=21:53 tgr@deploy2002: tgr, ariel, matmarex, mszabo: Backport for Localisation updates (November 26) (T372175), extend account creation lookup service to cover forced creations by others (T378401), extend account creation backfill script to forced account creations by others (T378401), [[gerrit:1098929|ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot}}
- 21:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Schema change (T361627)
- 21:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: Schema change (T361627)
- 21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1178 depool (T361627)', diff saved to https://phabricator.wikimedia.org/P71373 and previous config saved to /var/cache/conftool/dbconfig/20241128-215026-ladsgroup.json
- {{safesubst:SAL entry|1=21:39 tgr@deploy2002: Started scap sync-world: Backport for Localisation updates (November 26) (T372175), extend account creation lookup service to cover forced creations by others (T378401), extend account creation backfill script to forced account creations by others (T378401), [[gerrit:1098929|ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deplo}}
- 21:25 tgr@deploy2002: Finished scap sync-world: Backport for Reader Survey: Undeploy on enwiki (T378660), Reader Survey: Deploy on multiple wikis (T378660) (duration: 14m 43s)
- 21:18 tgr@deploy2002: tgr, dani: Continuing with sync
- 21:17 aqu@deploy2002: Finished deploy [airflow-dags/analytics@6d38940]: Generate canary events faster in Airflow (duration: 01m 39s)
- 21:16 tgr@deploy2002: tgr, dani: Backport for Reader Survey: Undeploy on enwiki (T378660), Reader Survey: Deploy on multiple wikis (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:15 aqu@deploy2002: Started deploy [airflow-dags/analytics@6d38940]: Generate canary events faster in Airflow
- 21:10 tgr@deploy2002: Started scap sync-world: Backport for Reader Survey: Undeploy on enwiki (T378660), Reader Survey: Deploy on multiple wikis (T378660)
- 20:30 kharlan@deploy2002: Finished scap sync-world: Backport for ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277) (duration: 13m 08s)
- 20:23 kharlan@deploy2002: kharlan, mszabo: Continuing with sync
- 20:23 kharlan@deploy2002: kharlan, mszabo: Backport for ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:16 kharlan@deploy2002: Started scap sync-world: Backport for ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277)
- 19:50 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.roll-reimage-nodes (exit_code=0) rolling reimage on P{wikikube-worker[1276-1277].eqiad.wmnet} and (A:wikikube-staging-master-codfw or A:wikikube-staging-worker-codfw or A:wikikube-staging-master-eqiad or A:wikikube-staging-worker-eqiad or A:wikikube-master-codfw or A:wikikube-worker-codfw or A:wikikube-master-eqiad or A:wikikube-worker-eqiad or A:ml-serve-master-eqiad or
- 19:50 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1277.eqiad.wmnet with OS bookworm
- 19:31 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage
- 19:27 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage
- 19:08 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1277.eqiad.wmnet with OS bookworm
- 18:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1276.eqiad.wmnet with OS bookworm
- 18:09 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage
- 18:06 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage
- 17:47 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1276.eqiad.wmnet with OS bookworm
- 17:45 kamila@cumin1002: START - Cookbook sre.k8s.roll-reimage-nodes rolling reimage on P{wikikube-worker[1276-1277].eqiad.wmnet} and (A:wikikube-staging-master-codfw or A:wikikube-staging-worker-codfw or A:wikikube-staging-master-eqiad or A:wikikube-staging-worker-eqiad or A:wikikube-master-codfw or A:wikikube-worker-codfw or A:wikikube-master-eqiad or A:wikikube-worker-eqiad or A:ml-serve-master-eqiad or A:ml-serve-worker-
- 17:06 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye
- 16:51 Emperor: depool/restart swift/repool ms-fe2014
- 16:51 Emperor: depool/restart swift/repool ms-fe2009
- 16:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
- 16:41 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
- 16:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 16:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 16:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 16:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 16:28 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
- 16:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2227.codfw.wmnet with reason: Maintenance
- 16:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2227.codfw.wmnet with reason: Maintenance
- 16:24 gmodena@deploy2002: Finished deploy [airflow-dags/analytics@d7c0f58]: webrequest_frontend post deployment fixes (duration: 02m 22s)
- 16:24 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2088.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2088.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:23 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:23 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:22 gmodena@deploy2002: Started deploy [airflow-dags/analytics@d7c0f58]: webrequest_frontend post deployment fixes
- 16:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 16:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: Maintenance
- 16:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2205.codfw.wmnet with reason: Maintenance
- 16:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 16:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 16:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 16:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 15:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 15:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 15:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 15:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 15:46 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp-test2004.wikimedia.org
- 15:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 15:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 15:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 15:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 15:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2004.wikimedia.org
- 15:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2005.wikimedia.org
- 15:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2005.wikimedia.org
- 15:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032 (T376905)', diff saved to https://phabricator.wikimedia.org/P71371 and previous config saved to /var/cache/conftool/dbconfig/20241128-153202-ladsgroup.json
- 15:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 15:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 15:27 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303] (hadoop-test): Gobblin config changes [analytics/refinery@ac873037] (duration: 00m 26s)
- 15:26 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303] (hadoop-test): Gobblin config changes [analytics/refinery@ac873037]
- 15:25 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303] (thin): Gobblin config changes THIN [analytics/refinery@ac873037] (duration: 00m 30s)
- 15:25 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303] (thin): Gobblin config changes THIN [analytics/refinery@ac873037]
- 15:21 moritzm: removing ganeti1018 from active Ganeti nodes T378921
- 15:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 15:20 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
- 15:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 15:19 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303]: Gobblin config changes [analytics/refinery@ac873037] (duration: 03m 05s)
- 15:19 elukey@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
- 15:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 15:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P71370 and previous config saved to /var/cache/conftool/dbconfig/20241128-151655-ladsgroup.json
- 15:16 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303]: Gobblin config changes [analytics/refinery@ac873037]
- 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
- 15:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 15:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 15:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 15:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 15:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 15:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 15:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 15:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 15:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 15:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 15:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 15:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P71369 and previous config saved to /var/cache/conftool/dbconfig/20241128-150148-ladsgroup.json
- 15:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 15:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 15:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2229.codfw.wmnet with reason: Maintenance
- 14:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2229.codfw.wmnet with reason: Maintenance
- 14:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 14:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 14:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 14:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 14:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 14:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 14:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- {{safesubst:SAL entry|1=14:54 urbanecm@deploy2002: Finished scap sync-world: Backport for Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788), ReportIncident: Enable instrumentation on labs (T372823), Enable message group subscription feature for some wikis (T372386), [[gerrit:1098622|Use `useformat` query param for device detection or mobile domain (m.)}}
- 14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 14:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 14:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 14:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 14:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 14:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 14:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 14:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 14:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 14:47 urbanecm@deploy2002: urbanecm, tgr, abi, mszabo: Continuing with sync
- 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032 (T376905)', diff saved to https://phabricator.wikimedia.org/P71352 and previous config saved to /var/cache/conftool/dbconfig/20241128-144641-ladsgroup.json
- 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2032 (T376905)', diff saved to https://phabricator.wikimedia.org/P71351 and previous config saved to /var/cache/conftool/dbconfig/20241128-144039-ladsgroup.json
- 14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
- 14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
- 14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T376905)', diff saved to https://phabricator.wikimedia.org/P71350 and previous config saved to /var/cache/conftool/dbconfig/20241128-144012-ladsgroup.json
- 14:39 urbanecm: [urbanecm@deploy2002 ~]$ while read wiki; do echo "== $wiki"; mwscript-k8s extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=$wiki; done < wikis.txt # wikis.txt is at P71349 # T378827
- 14:36 urbanecm: [urbanecm@deploy2002 ~]$ mwscript-k8s -f extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=bswiki # T378827
- 14:33 moritzm: installing node-es-module-lexer updates from Bookworm point release
- {{safesubst:SAL entry|1=14:28 urbanecm@deploy2002: urbanecm, tgr, abi, mszabo: Backport for Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788), ReportIncident: Enable instrumentation on labs (T372823), Enable message group subscription feature for some wikis (T372386), [[gerrit:1098622|Use `useformat` query param for device detection or mobile domain (m.}}
- 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P71347 and previous config saved to /var/cache/conftool/dbconfig/20241128-142505-ladsgroup.json
- 14:25 Dreamy_Jazz: Started MediaModeration scanning scripts to run again over all wikis
- {{safesubst:SAL entry|1=14:23 urbanecm@deploy2002: Started scap sync-world: Backport for Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788), ReportIncident: Enable instrumentation on labs (T372823), Enable message group subscription feature for some wikis (T372386), [[gerrit:1098622|Use `useformat` query param for device detection or mobile domain (m.) (}}
- 14:22 urbanecm@deploy2002: Finished scap sync-world: Backport for Allow IRS to record server-side interaction events (T380599), Revert^2 "Add contact form for U4C" (duration: 14m 07s)
- 14:22 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 14:15 urbanecm@deploy2002: nmw03, mszabo, urbanecm: Continuing with sync
- 14:14 moritzm: installing apr security updates
- 14:14 urbanecm@deploy2002: nmw03, mszabo, urbanecm: Backport for Allow IRS to record server-side interaction events (T380599), Revert^2 "Add contact form for U4C" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P71346 and previous config saved to /var/cache/conftool/dbconfig/20241128-140958-ladsgroup.json
- 14:08 urbanecm@deploy2002: Started scap sync-world: Backport for Allow IRS to record server-side interaction events (T380599), Revert^2 "Add contact form for U4C"
- 14:06 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T376905)', diff saved to https://phabricator.wikimedia.org/P71345 and previous config saved to /var/cache/conftool/dbconfig/20241128-135451-ladsgroup.json
- 13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2028 (T376905)', diff saved to https://phabricator.wikimedia.org/P71344 and previous config saved to /var/cache/conftool/dbconfig/20241128-134859-ladsgroup.json
- 13:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
- 13:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
- 12:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031 (T376905)', diff saved to https://phabricator.wikimedia.org/P71343 and previous config saved to /var/cache/conftool/dbconfig/20241128-124957-ladsgroup.json
- 12:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P71342 and previous config saved to /var/cache/conftool/dbconfig/20241128-123451-ladsgroup.json
- 12:23 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P71340 and previous config saved to /var/cache/conftool/dbconfig/20241128-121943-ladsgroup.json
- 12:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031 (T376905)', diff saved to https://phabricator.wikimedia.org/P71339 and previous config saved to /var/cache/conftool/dbconfig/20241128-120437-ladsgroup.json
- 12:04 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump ratio of new parsercache key spec to 2 (T373037) (duration: 12m 37s)
- 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T370903)', diff saved to https://phabricator.wikimedia.org/P71338 and previous config saved to /var/cache/conftool/dbconfig/20241128-120031-ladsgroup.json
- 11:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2031 (T376905)', diff saved to https://phabricator.wikimedia.org/P71337 and previous config saved to /var/cache/conftool/dbconfig/20241128-115741-ladsgroup.json
- 11:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance
- 11:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance
- 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033 (T376905)', diff saved to https://phabricator.wikimedia.org/P71336 and previous config saved to /var/cache/conftool/dbconfig/20241128-115715-ladsgroup.json
- 11:57 ladsgroup@deploy2002: ladsgroup: Backport for Bump ratio of new parsercache key spec to 2 (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:51 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump ratio of new parsercache key spec to 2 (T373037)
- 11:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2237 gradually with 4 steps - Maint over (T379813)
- 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P71334 and previous config saved to /var/cache/conftool/dbconfig/20241128-114524-ladsgroup.json
- 11:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P71333 and previous config saved to /var/cache/conftool/dbconfig/20241128-114208-ladsgroup.json
- 11:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
- 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P71330 and previous config saved to /var/cache/conftool/dbconfig/20241128-113017-ladsgroup.json
- 11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P71329 and previous config saved to /var/cache/conftool/dbconfig/20241128-112701-ladsgroup.json
- 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
- 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T370903)', diff saved to https://phabricator.wikimedia.org/P71327 and previous config saved to /var/cache/conftool/dbconfig/20241128-111510-ladsgroup.json
- 11:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
- 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T370903)', diff saved to https://phabricator.wikimedia.org/P71326 and previous config saved to /var/cache/conftool/dbconfig/20241128-111300-ladsgroup.json
- 11:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 11:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033 (T376905)', diff saved to https://phabricator.wikimedia.org/P71325 and previous config saved to /var/cache/conftool/dbconfig/20241128-111154-ladsgroup.json
- 11:11 moritzm: removing ganeti1022 from active Ganeti nodes T378921
- 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
- 11:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 11:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 11:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2204.codfw.wmnet with reason: Maintenance
- 11:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2204.codfw.wmnet with reason: Maintenance
- 11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2033 (T376905)', diff saved to https://phabricator.wikimedia.org/P71324 and previous config saved to /var/cache/conftool/dbconfig/20241128-110457-ladsgroup.json
- 11:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
- 11:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
- 11:03 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2237 gradually with 4 steps - Maint over (T379813)
- 10:51 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix commit bug - oblivian@cumin1002"
- 10:51 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix commit bug - oblivian@cumin1002
- 10:51 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix commit bug - oblivian@cumin1002
- 10:51 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix commit bug - oblivian@cumin1002"
- 10:32 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 10:27 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:36 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases1003.eqiad.wmnet (duration: 01m 22s)
- 09:35 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases1003.eqiad.wmnet
- 09:31 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases2003.codfw.wmnet (duration: 01m 27s)
- 09:30 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases2003.codfw.wmnet
- 09:23 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:22 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.5 refs T375664
- 09:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 09:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 09:09 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:06 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 09:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 09:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T370903)', diff saved to https://phabricator.wikimedia.org/P71319 and previous config saved to /var/cache/conftool/dbconfig/20241128-090035-ladsgroup.json
- 08:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P71318 and previous config saved to /var/cache/conftool/dbconfig/20241128-084528-ladsgroup.json
- 08:43 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 08:41 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 08:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P71317 and previous config saved to /var/cache/conftool/dbconfig/20241128-083021-ladsgroup.json
- 08:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T370903)', diff saved to https://phabricator.wikimedia.org/P71316 and previous config saved to /var/cache/conftool/dbconfig/20241128-081514-ladsgroup.json
- 08:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T370903)', diff saved to https://phabricator.wikimedia.org/P71315 and previous config saved to /var/cache/conftool/dbconfig/20241128-080244-ladsgroup.json
- 08:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2195.codfw.wmnet with reason: Maintenance
- 08:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2195.codfw.wmnet with reason: Maintenance
- 08:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T370903)', diff saved to https://phabricator.wikimedia.org/P71314 and previous config saved to /var/cache/conftool/dbconfig/20241128-080221-ladsgroup.json
- 07:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
- 07:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P71313 and previous config saved to /var/cache/conftool/dbconfig/20241128-074714-ladsgroup.json
- 07:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P71312 and previous config saved to /var/cache/conftool/dbconfig/20241128-073207-ladsgroup.json
- 07:23 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "CSRF token support - oblivian@cumin1002"
- 07:23 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: CSRF token support - oblivian@cumin1002
- 07:23 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: CSRF token support - oblivian@cumin1002
- 07:22 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "CSRF token support - oblivian@cumin1002"
- 07:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T370903)', diff saved to https://phabricator.wikimedia.org/P71310 and previous config saved to /var/cache/conftool/dbconfig/20241128-071700-ladsgroup.json
- 07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T370903)', diff saved to https://phabricator.wikimedia.org/P71309 and previous config saved to /var/cache/conftool/dbconfig/20241128-070231-ladsgroup.json
- 07:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance
- 07:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance
- 07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T370903)', diff saved to https://phabricator.wikimedia.org/P71308 and previous config saved to /var/cache/conftool/dbconfig/20241128-070209-ladsgroup.json
- 07:02 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P71307 and previous config saved to /var/cache/conftool/dbconfig/20241128-064702-ladsgroup.json
- 06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P71306 and previous config saved to /var/cache/conftool/dbconfig/20241128-063155-ladsgroup.json
- 06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T370903)', diff saved to https://phabricator.wikimedia.org/P71305 and previous config saved to /var/cache/conftool/dbconfig/20241128-061647-ladsgroup.json
- 06:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T370903)', diff saved to https://phabricator.wikimedia.org/P71304 and previous config saved to /var/cache/conftool/dbconfig/20241128-060418-ladsgroup.json
- 06:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 06:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 06:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T370903)', diff saved to https://phabricator.wikimedia.org/P71303 and previous config saved to /var/cache/conftool/dbconfig/20241128-060355-ladsgroup.json
- 05:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P71302 and previous config saved to /var/cache/conftool/dbconfig/20241128-054847-ladsgroup.json
- 05:48 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 05:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P71301 and previous config saved to /var/cache/conftool/dbconfig/20241128-053340-ladsgroup.json
- 05:29 tstarling@deploy2002: Finished scap sync-world: Backport for Add frwiki on labs for new addWiki.php test (duration: 13m 41s)
- 05:23 tstarling@deploy2002: tstarling: Continuing with sync
- 05:22 tstarling@deploy2002: tstarling: Backport for Add frwiki on labs for new addWiki.php test synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 05:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T370903)', diff saved to https://phabricator.wikimedia.org/P71300 and previous config saved to /var/cache/conftool/dbconfig/20241128-051833-ladsgroup.json
- 05:16 tstarling@deploy2002: Started scap sync-world: Backport for Add frwiki on labs for new addWiki.php test
- 05:06 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 05:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T370903)', diff saved to https://phabricator.wikimedia.org/P71299 and previous config saved to /var/cache/conftool/dbconfig/20241128-050352-ladsgroup.json
- 05:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 05:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 05:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T370903)', diff saved to https://phabricator.wikimedia.org/P71298 and previous config saved to /var/cache/conftool/dbconfig/20241128-050329-ladsgroup.json
- 04:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P71297 and previous config saved to /var/cache/conftool/dbconfig/20241128-044822-ladsgroup.json
- 04:41 eileen: civicrm upgraded from ed67a1b2 to be7e5d33
- 04:36 eileen: * civicrm upgraded from 40f4f1a3 to ed67a1b2
- 04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P71296 and previous config saved to /var/cache/conftool/dbconfig/20241128-043314-ladsgroup.json
- 04:26 eileen: * civicrm upgraded from 7ade5fd7 to 40f4f1a3
- 04:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T370903)', diff saved to https://phabricator.wikimedia.org/P71294 and previous config saved to /var/cache/conftool/dbconfig/20241128-041807-ladsgroup.json
- 04:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T370903)', diff saved to https://phabricator.wikimedia.org/P71292 and previous config saved to /var/cache/conftool/dbconfig/20241128-040326-ladsgroup.json
- 04:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 04:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 04:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 04:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 04:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T370903)', diff saved to https://phabricator.wikimedia.org/P71291 and previous config saved to /var/cache/conftool/dbconfig/20241128-040248-ladsgroup.json
- 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P71290 and previous config saved to /var/cache/conftool/dbconfig/20241128-034741-ladsgroup.json
- 03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P71289 and previous config saved to /var/cache/conftool/dbconfig/20241128-033234-ladsgroup.json
- 03:22 eileen: config revision changed from f284fd46 to a3175f86 (like for real this time)
- 03:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T370903)', diff saved to https://phabricator.wikimedia.org/P71288 and previous config saved to /var/cache/conftool/dbconfig/20241128-031726-ladsgroup.json
- 03:14 eileen: onfig revision changed from f284fd46 to a3175f86
- 03:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T370903)', diff saved to https://phabricator.wikimedia.org/P71287 and previous config saved to /var/cache/conftool/dbconfig/20241128-030213-ladsgroup.json
- 03:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 03:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T370903)', diff saved to https://phabricator.wikimedia.org/P71286 and previous config saved to /var/cache/conftool/dbconfig/20241128-030151-ladsgroup.json
- 02:53 eileen: civicrm upgraded from c8c461b9 to 7ade5fd7
- 02:46 eileen: * civicrm upgraded from 80f03357 to c8c461b9
- 02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P71285 and previous config saved to /var/cache/conftool/dbconfig/20241128-024644-ladsgroup.json
- 02:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P71284 and previous config saved to /var/cache/conftool/dbconfig/20241128-023136-ladsgroup.json
- 02:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T370903)', diff saved to https://phabricator.wikimedia.org/P71283 and previous config saved to /var/cache/conftool/dbconfig/20241128-021629-ladsgroup.json
- 02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T370903)', diff saved to https://phabricator.wikimedia.org/P71282 and previous config saved to /var/cache/conftool/dbconfig/20241128-020143-ladsgroup.json
- 02:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance
- 02:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance
- 02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T370903)', diff saved to https://phabricator.wikimedia.org/P71281 and previous config saved to /var/cache/conftool/dbconfig/20241128-020120-ladsgroup.json
- 01:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P71280 and previous config saved to /var/cache/conftool/dbconfig/20241128-014613-ladsgroup.json
- 01:38 eileen: civicrm upgraded from 3b1ed162 to 80f03357
- 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P71279 and previous config saved to /var/cache/conftool/dbconfig/20241128-013106-ladsgroup.json
- 01:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T370903)', diff saved to https://phabricator.wikimedia.org/P71278 and previous config saved to /var/cache/conftool/dbconfig/20241128-011559-ladsgroup.json
- 01:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T370903)', diff saved to https://phabricator.wikimedia.org/P71277 and previous config saved to /var/cache/conftool/dbconfig/20241128-010112-ladsgroup.json
- 01:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 01:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance
- 01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T370903)', diff saved to https://phabricator.wikimedia.org/P71276 and previous config saved to /var/cache/conftool/dbconfig/20241128-010049-ladsgroup.json
- 00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P71275 and previous config saved to /var/cache/conftool/dbconfig/20241128-004542-ladsgroup.json
- 00:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P71274 and previous config saved to /var/cache/conftool/dbconfig/20241128-003035-ladsgroup.json
- 00:16 tstarling@deploy2002: Finished scap sync-world: Backport for Move default main page text for new wikis to config (T352113), Introduce preinstall.dblist for wikis that haven't been installed yet (T352113) (duration: 14m 42s)
- 00:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T370903)', diff saved to https://phabricator.wikimedia.org/P71273 and previous config saved to /var/cache/conftool/dbconfig/20241128-001528-ladsgroup.json
- 00:09 tstarling@deploy2002: tstarling: Continuing with sync
- 00:07 tstarling@deploy2002: tstarling: Backport for Move default main page text for new wikis to config (T352113), Introduce preinstall.dblist for wikis that haven't been installed yet (T352113) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 00:01 tstarling@deploy2002: Started scap sync-world: Backport for Move default main page text for new wikis to config (T352113), Introduce preinstall.dblist for wikis that haven't been installed yet (T352113)
- 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T370903)', diff saved to https://phabricator.wikimedia.org/P71272 and previous config saved to /var/cache/conftool/dbconfig/20241128-000046-ladsgroup.json
- 00:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 00:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T370903)', diff saved to https://phabricator.wikimedia.org/P71271 and previous config saved to /var/cache/conftool/dbconfig/20241128-000023-ladsgroup.json
2024-11-27
- 23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P71270 and previous config saved to /var/cache/conftool/dbconfig/20241127-234518-ladsgroup.json
- 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P71269 and previous config saved to /var/cache/conftool/dbconfig/20241127-233011-ladsgroup.json
- 23:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T370903)', diff saved to https://phabricator.wikimedia.org/P71267 and previous config saved to /var/cache/conftool/dbconfig/20241127-231504-ladsgroup.json
- 23:09 tgr@deploy2002: Finished scap sync-world: Backport for Fix mobile domain logic for login.wikimedia.org (T380646) (duration: 18m 07s)
- 23:02 tgr@deploy2002: tgr: Continuing with sync
- 23:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T370903)', diff saved to https://phabricator.wikimedia.org/P71264 and previous config saved to /var/cache/conftool/dbconfig/20241127-230159-ladsgroup.json
- 23:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 23:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 22:56 tgr@deploy2002: tgr: Backport for Fix mobile domain logic for login.wikimedia.org (T380646) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 22:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 22:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T370903)', diff saved to https://phabricator.wikimedia.org/P71263 and previous config saved to /var/cache/conftool/dbconfig/20241127-225159-ladsgroup.json
- 22:51 tgr@deploy2002: Started scap sync-world: Backport for Fix mobile domain logic for login.wikimedia.org (T380646)
- 22:46 cjming: end of UTC late backport window
- 22:44 cjming@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Read views on jawikivoyage (T380769) (duration: 15m 22s)
- 22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P71262 and previous config saved to /var/cache/conftool/dbconfig/20241127-223652-ladsgroup.json
- 22:35 cjming@deploy2002: cscott, cjming: Continuing with sync
- 22:35 cjming@deploy2002: cscott, cjming: Backport for Turn on Parsoid Read views on jawikivoyage (T380769) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:29 cjming@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Read views on jawikivoyage (T380769)
- 22:27 cjming@deploy2002: Finished scap sync-world: Backport for Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664), Bump wikimedia/parsoid to 0.21.0-a9 (T380664) (duration: 42m 38s)
- 22:26 bking@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin1002"
- 22:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P71261 and previous config saved to /var/cache/conftool/dbconfig/20241127-222145-ladsgroup.json
- 22:13 cjming@deploy2002: arlolra, cjming: Continuing with sync
- 22:11 bking@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage
- 22:09 cjming@deploy2002: arlolra, cjming: Backport for Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664), Bump wikimedia/parsoid to 0.21.0-a9 (T380664) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:07 bking@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage
- 22:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T370903)', diff saved to https://phabricator.wikimedia.org/P71260 and previous config saved to /var/cache/conftool/dbconfig/20241127-220638-ladsgroup.json
- 21:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T370903)', diff saved to https://phabricator.wikimedia.org/P71259 and previous config saved to /var/cache/conftool/dbconfig/20241127-215407-ladsgroup.json
- 21:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 21:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 21:45 cjming@deploy2002: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664), Bump wikimedia/parsoid to 0.21.0-a9 (T380664)
- 21:43 cjming@deploy2002: Finished scap sync-world: Backport for Revert "Normalize ref html before comparison" (T380977) (duration: 12m 49s)
- 21:40 bking@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
- 21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T370903)', diff saved to https://phabricator.wikimedia.org/P71258 and previous config saved to /var/cache/conftool/dbconfig/20241127-213759-ladsgroup.json
- 21:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1026.eqiad.wmnet with OS bullseye
- 21:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
- 21:37 cjming@deploy2002: cjming, cscott: Continuing with sync
- 21:37 cjming@deploy2002: cjming, cscott: Backport for Revert "Normalize ref html before comparison" (T380977) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:31 cjming@deploy2002: Started scap sync-world: Backport for Revert "Normalize ref html before comparison" (T380977)
- 21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P71257 and previous config saved to /var/cache/conftool/dbconfig/20241127-212252-ladsgroup.json
- 21:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029 (T376905)', diff saved to https://phabricator.wikimedia.org/P71256 and previous config saved to /var/cache/conftool/dbconfig/20241127-211704-ladsgroup.json
- 21:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P71255 and previous config saved to /var/cache/conftool/dbconfig/20241127-210745-ladsgroup.json
- 21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P71254 and previous config saved to /var/cache/conftool/dbconfig/20241127-210157-ladsgroup.json
- 20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T370903)', diff saved to https://phabricator.wikimedia.org/P71253 and previous config saved to /var/cache/conftool/dbconfig/20241127-205238-ladsgroup.json
- 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P71252 and previous config saved to /var/cache/conftool/dbconfig/20241127-204650-ladsgroup.json
- 20:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Optimize (T379813)
- 20:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Optimize (T379813)
- 20:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2237 depool (T379813)', diff saved to https://phabricator.wikimedia.org/P71251 and previous config saved to /var/cache/conftool/dbconfig/20241127-204450-ladsgroup.json
- 20:38 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
- 20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T370903)', diff saved to https://phabricator.wikimedia.org/P71250 and previous config saved to /var/cache/conftool/dbconfig/20241127-203724-ladsgroup.json
- 20:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 20:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 20:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T370903)', diff saved to https://phabricator.wikimedia.org/P71249 and previous config saved to /var/cache/conftool/dbconfig/20241127-203650-ladsgroup.json
- 20:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029 (T376905)', diff saved to https://phabricator.wikimedia.org/P71248 and previous config saved to /var/cache/conftool/dbconfig/20241127-203143-ladsgroup.json
- 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2029 (T376905)', diff saved to https://phabricator.wikimedia.org/P71247 and previous config saved to /var/cache/conftool/dbconfig/20241127-202446-ladsgroup.json
- 20:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
- 20:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
- 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034 (T376905)', diff saved to https://phabricator.wikimedia.org/P71246 and previous config saved to /var/cache/conftool/dbconfig/20241127-202420-ladsgroup.json
- 20:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P71245 and previous config saved to /var/cache/conftool/dbconfig/20241127-202143-ladsgroup.json
- 20:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1026.eqiad.wmnet with reason: host reimage
- 20:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1026.eqiad.wmnet with reason: host reimage
- 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P71244 and previous config saved to /var/cache/conftool/dbconfig/20241127-200913-ladsgroup.json
- 20:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P71243 and previous config saved to /var/cache/conftool/dbconfig/20241127-200636-ladsgroup.json
- 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P71242 and previous config saved to /var/cache/conftool/dbconfig/20241127-195406-ladsgroup.json
- 19:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T370903)', diff saved to https://phabricator.wikimedia.org/P71241 and previous config saved to /var/cache/conftool/dbconfig/20241127-195129-ladsgroup.json
- 19:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye
- 19:50 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1025.eqiad.wmnet with OS bullseye
- 19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034 (T376905)', diff saved to https://phabricator.wikimedia.org/P71240 and previous config saved to /var/cache/conftool/dbconfig/20241127-193858-ladsgroup.json
- 19:36 moritzm: imported jenkins 2.479.2 to thirdparty/ci for bullseye-wikimedia
- 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T370903)', diff saved to https://phabricator.wikimedia.org/P71239 and previous config saved to /var/cache/conftool/dbconfig/20241127-193529-ladsgroup.json
- 19:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 19:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T370903)', diff saved to https://phabricator.wikimedia.org/P71238 and previous config saved to /var/cache/conftool/dbconfig/20241127-193507-ladsgroup.json
- 19:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
- 19:32 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1025.eqiad.wmnet with OS bullseye
- 19:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2034 (T376905)', diff saved to https://phabricator.wikimedia.org/P71237 and previous config saved to /var/cache/conftool/dbconfig/20241127-193202-ladsgroup.json
- 19:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
- 19:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
- 19:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye
- 19:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
- 19:23 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1025.eqiad.wmnet with OS bullseye
- 19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P71236 and previous config saved to /var/cache/conftool/dbconfig/20241127-192000-ladsgroup.json
- 19:18 brett@cumin2002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru [reason: repool magru, T376737]
- 19:18 brett@cumin2002: START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: repool magru, T376737]
- 19:17 mforns@deploy2002: Finished deploy [airflow-dags/analytics@99032bf]: regular weekly train (duration: 03m 10s)
- 19:14 mforns@deploy2002: Started deploy [airflow-dags/analytics@99032bf]: regular weekly train
- 19:13 mutante: disabled puppet on R:scap::target (180 hosts) for a short time - deploying gerrit:1092841
- 19:09 brett@puppetserver1001: conftool action : set/pooled=yes; selector: dc=magru,service=cdn
- 19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P71235 and previous config saved to /var/cache/conftool/dbconfig/20241127-190453-ladsgroup.json
- 19:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
- 18:56 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1025.eqiad.wmnet with OS bullseye
- 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T370903)', diff saved to https://phabricator.wikimedia.org/P71233 and previous config saved to /var/cache/conftool/dbconfig/20241127-184946-ladsgroup.json
- 18:47 fabfur@cumin1002: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=magru
- 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 16 hosts
- 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for 16 hosts
- 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7003.magru.wmnet
- 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7003.magru.wmnet
- 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7002.magru.wmnet
- 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7002.magru.wmnet
- 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7001.magru.wmnet
- 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7001.magru.wmnet
- 18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7002.wikimedia.org
- 18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7002.wikimedia.org
- 18:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7001.wikimedia.org
- 18:37 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7001.wikimedia.org
- 18:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7001.wikimedia.org with reason: T380307
- 18:37 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7001.wikimedia.org with reason: T380307
- 18:36 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
- 18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T370903)', diff saved to https://phabricator.wikimedia.org/P71232 and previous config saved to /var/cache/conftool/dbconfig/20241127-183455-ladsgroup.json
- 18:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance
- 18:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance
- 18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T370903)', diff saved to https://phabricator.wikimedia.org/P71231 and previous config saved to /var/cache/conftool/dbconfig/20241127-183432-ladsgroup.json
- 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P71230 and previous config saved to /var/cache/conftool/dbconfig/20241127-181925-ladsgroup.json
- 18:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
- 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P71229 and previous config saved to /var/cache/conftool/dbconfig/20241127-180418-ladsgroup.json
- 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T370903)', diff saved to https://phabricator.wikimedia.org/P71228 and previous config saved to /var/cache/conftool/dbconfig/20241127-174911-ladsgroup.json
- 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T370903)', diff saved to https://phabricator.wikimedia.org/P71227 and previous config saved to /var/cache/conftool/dbconfig/20241127-173426-ladsgroup.json
- 17:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 17:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T370903)', diff saved to https://phabricator.wikimedia.org/P71226 and previous config saved to /var/cache/conftool/dbconfig/20241127-173403-ladsgroup.json
- 17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
- 17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
- 17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
- 17:32 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
- 17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
- 17:31 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
- 17:31 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
- 17:31 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
- 17:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 17:27 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 17:27 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 17:27 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 17:25 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:24 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 17:24 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:23 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 17:20 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
- 17:19 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
- 17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P71225 and previous config saved to /var/cache/conftool/dbconfig/20241127-171857-ladsgroup.json
- 17:17 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7010.magru.wmnet with OS bullseye
- 17:16 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
- 17:16 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
- 17:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main1007.eqiad.wmnet
- 17:14 jiji@cumin1002: START - Cookbook sre.hosts.remove-downtime for kafka-main1007.eqiad.wmnet
- 17:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P71224 and previous config saved to /var/cache/conftool/dbconfig/20241127-170350-ladsgroup.json
- 16:56 jiji@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:55 jiji@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:55 jiji@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:55 jiji@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:55 jiji@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 16:54 jiji@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 16:54 jiji@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 16:54 jiji@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 16:54 jiji@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 16:53 jiji@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 16:53 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 16:53 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 16:53 jiji@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 16:52 jiji@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 16:52 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 16:52 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 16:52 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 16:52 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 16:51 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
- 16:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T370903)', diff saved to https://phabricator.wikimedia.org/P71222 and previous config saved to /var/cache/conftool/dbconfig/20241127-164843-ladsgroup.json
- 16:47 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
- 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T370903)', diff saved to https://phabricator.wikimedia.org/P71221 and previous config saved to /var/cache/conftool/dbconfig/20241127-163407-ladsgroup.json
- 16:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 16:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T370903)', diff saved to https://phabricator.wikimedia.org/P71220 and previous config saved to /var/cache/conftool/dbconfig/20241127-163344-ladsgroup.json
- 16:27 jiji@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad
- 16:26 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7010.magru.wmnet with OS bullseye
- 16:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P71218 and previous config saved to /var/cache/conftool/dbconfig/20241127-161837-ladsgroup.json
- 16:16 jiji@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad
- 16:12 effie: roll restarting kafka-main brokers - T363214
- 16:11 moritzm: installing distro-info-data updates from bookworm point release
- 16:11 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:11 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp70101 - fabfur@cumin1002"
- 16:11 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp70101 - fabfur@cumin1002"
- 16:05 fabfur@cumin1002: START - Cookbook sre.dns.netbox
- 16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P71217 and previous config saved to /var/cache/conftool/dbconfig/20241127-160330-ladsgroup.json
- 15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T370903)', diff saved to https://phabricator.wikimedia.org/P71216 and previous config saved to /var/cache/conftool/dbconfig/20241127-154823-ladsgroup.json
- 15:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye
- 15:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye
- 15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T370903)', diff saved to https://phabricator.wikimedia.org/P71215 and previous config saved to /var/cache/conftool/dbconfig/20241127-153316-ladsgroup.json
- 15:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 15:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 15:32 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:31 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:30 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:30 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:28 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:27 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:22 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:22 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:21 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:20 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:09 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:08 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:08 Krinkle: krinkle@webperf2003: `sudo apt-get install kafkacat` (matching webperf1003, for ad-hoc debugging)
- 15:05 kart_: Updated recommendation-api to 2024-11-27-142924-production (T380838, T379036, T380699)
- 15:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1003.eqiad.wmnet to plain
- 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1003.eqiad.wmnet to plain
- 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
- 15:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
- 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1003.eqiad.wmnet to drbd
- 14:59 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 14:58 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 14:51 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1003.eqiad.wmnet to drbd
- 14:48 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
- 14:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
- 14:35 moritzm: rebalance magru01 following switch of VMs back to DRBD T376737
- 14:33 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on doh[7001-7002].wikimedia.org with reason: site is depooled, maintenance
- 14:33 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on doh[7001-7002].wikimedia.org with reason: site is depooled, maintenance
- 14:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [GrowthExperiments] Undefine wgGEDatabaseCluster (T354939) (duration: 12m 21s)
- 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to drbd
- 14:26 urbanecm@deploy2002: urbanecm: Continuing with sync
- 14:26 urbanecm@deploy2002: urbanecm: Backport for [GrowthExperiments] Undefine wgGEDatabaseCluster (T354939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:25 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1061.eqiad.wmnet with reason: cloudvirt1061 needs maintenance T380673
- 14:25 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1061.eqiad.wmnet with reason: cloudvirt1061 needs maintenance T380673
- 14:24 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikiquote-wordmark-az.svg (T380974)
- 14:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
- 14:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye
- 14:21 urbanecm@deploy2002: Started scap sync-world: Backport for [GrowthExperiments] Undefine wgGEDatabaseCluster (T354939)
- 14:20 urbanecm@deploy2002: Finished scap sync-world: Backport for Enable ParserMigration compact indicator on all wikis (T363484), Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401), Updated wordmark for Azerbaijani Wikiquote (T380974) (duration: 17m 20s)
- 14:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to drbd
- 14:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to drbd
- 14:13 urbanecm@deploy2002: urbanecm, cscott, nmw03: Continuing with sync
- 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to drbd
- 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to drbd
- 14:08 urbanecm@deploy2002: urbanecm, cscott, nmw03: Backport for Enable ParserMigration compact indicator on all wikis (T363484), Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401), Updated wordmark for Azerbaijani Wikiquote (T380974) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:03 urbanecm@deploy2002: Started scap sync-world: Backport for Enable ParserMigration compact indicator on all wikis (T363484), Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401), Updated wordmark for Azerbaijani Wikiquote (T380974)
- 13:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to drbd
- 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to drbd
- 13:45 moritzm: installing php8.2 security updates
- 13:40 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to drbd
- 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to drbd
- 13:38 mszabo@deploy2002: Finished scap sync-world: Backport for private: Add stub for wgReportIncidentZendeskSubjectLine (T380868), Configure IRS Zendesk integration (T380908), Configure instrument for the Incident Reporting System (T372823) (duration: 13m 53s)
- 13:31 mszabo@deploy2002: mszabo: Continuing with sync
- 13:30 mszabo@deploy2002: mszabo: Backport for private: Add stub for wgReportIncidentZendeskSubjectLine (T380868), Configure IRS Zendesk integration (T380908), Configure instrument for the Incident Reporting System (T372823) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to drbd
- 13:27 moritzm: rebalance magru02 following switch of VMs back to DRBD T376737
- 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to drbd
- 13:24 mszabo@deploy2002: Started scap sync-world: Backport for private: Add stub for wgReportIncidentZendeskSubjectLine (T380868), Configure IRS Zendesk integration (T380908), Configure instrument for the Incident Reporting System (T372823)
- 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye
- 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
- 13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
- 13:16 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to drbd
- 13:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to drbd
- 13:05 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to drbd
- 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to drbd
- 12:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main[1002,1007].eqiad.wmnet with reason: Hardware refresh
- 12:56 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main[1002,1007].eqiad.wmnet with reason: Hardware refresh
- 12:50 moritzm: installing ghostscript security updates
- 12:39 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:38 effie: start replacing kafka-main1002 with kafka-main1007 - T363214
- 12:24 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 12:24 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 12:24 kart_: Updated cxserver to 2024-11-20-121713-production (T377966, T357950)
- 12:22 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 12:22 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 12:20 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 12:20 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 12:18 moritzm: installing python-cryptography security updates
- 12:14 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 12:13 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 12:12 moritzm: installing openssl security updates
- 12:08 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
- 12:07 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
- 12:06 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
- 12:06 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to drbd
- 12:06 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
- 12:05 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 12:05 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 12:05 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 12:05 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2042.codfw.wmnet with reason: broken CPU
- 12:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2042.codfw.wmnet with reason: broken CPU
- 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to drbd
- 11:45 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump ratio of new parsercache key spec to 3 (T373037) (duration: 12m 51s)
- 11:38 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 11:38 ladsgroup@deploy2002: ladsgroup: Backport for Bump ratio of new parsercache key spec to 3 (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:34 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to drbd
- 11:32 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump ratio of new parsercache key spec to 3 (T373037)
- 11:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to drbd
- 11:21 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7002.wikimedia.org with reason: T376737
- 11:21 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7002.wikimedia.org with reason: T376737
- 11:21 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7001.wikimedia.org with reason: T376737
- 11:20 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7001.wikimedia.org with reason: T376737
- 11:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs[7001-7003].magru.wmnet with reason: T376737
- 11:19 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs[7001-7003].magru.wmnet with reason: T376737
- 11:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 16 hosts with reason: T376737
- 11:19 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 16 hosts with reason: T376737
- 11:18 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to drbd
- 11:16 xSavitar: T380875 Ran mwscript-k8s --comment="T380875" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'EMBakeryEquipment' 'Janapanna'
- 11:15 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7002.magru.wmnet to cluster magru02 and group B4
- 11:13 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7002.magru.wmnet to cluster magru02 and group B4
- 11:13 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs7001.magru.wmnet with reason: T376737
- 11:13 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs7001.magru.wmnet with reason: T376737
- 11:04 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 11:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7001.magru.wmnet to cluster magru01 and group B3
- 11:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru01 and group B3
- 10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
- 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
- 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
- 10:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7008.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 10:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
- 10:01 fabfur@cumin1002: START - Cookbook sre.hosts.provision for host cp7008.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7002
- 09:59 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002
- 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7001
- 09:58 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7001
- 09:55 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7006.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "readded ganeti nodes in magru - jmm@cumin2002 - T376737"
- 09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "readded ganeti nodes in magru - jmm@cumin2002 - T376737"
- 09:46 fabfur@cumin1002: START - Cookbook sre.hosts.provision for host cp7006.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:45 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.5 refs T375664
- 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm
- 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
- 09:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
- 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
- 09:06 kartik@deploy2002: Finished scap sync-world: Backport for ext.uls.inputsettings: Use arrow functions (T380431) (duration: 16m 06s)
- 09:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
- 08:59 kartik@deploy2002: abi, kartik: Continuing with sync
- 08:55 kartik@deploy2002: abi, kartik: Backport for ext.uls.inputsettings: Use arrow functions (T380431) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:50 kartik@deploy2002: Started scap sync-world: Backport for ext.uls.inputsettings: Use arrow functions (T380431)
- 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm
- 08:38 kartik@deploy2002: Finished scap sync-world: Backport for Fix illegal access of typed property. (T380724) (duration: 21m 02s)
- 08:31 kartik@deploy2002: kartik, abi: Continuing with sync
- 08:24 kartik@deploy2002: kartik, abi: Backport for Fix illegal access of typed property. (T380724) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7002.magru.wmnet with OS bookworm
- 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
- 08:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
- 08:17 kartik@deploy2002: Started scap sync-world: Backport for Fix illegal access of typed property. (T380724)
- 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage
- 07:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage
- 07:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7002.magru.wmnet with OS bookworm
- 07:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
2024-11-26
- 23:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7002.wikimedia.org with OS bookworm
- 23:29 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 23:28 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 23:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs7001.magru.wmnet with OS bullseye
- 23:28 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 23:23 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 23:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7010.magru.wmnet with OS bullseye
- 23:13 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 23:12 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 23:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7001.magru.wmnet with reason: host reimage
- 23:00 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7001.magru.wmnet with reason: host reimage
- 22:54 reedy@deploy2002: Finished scap sync-world: Backport for Add CodeMirror to BetaFeaturesAllowList (T376735) (duration: 31m 35s)
- 22:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7002.wikimedia.org with reason: host reimage
- 22:48 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7002.wikimedia.org with reason: host reimage
- 22:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
- 22:45 reedy@deploy2002: musikanimal, reedy: Continuing with sync
- 22:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
- 22:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7001.magru.wmnet with OS bullseye
- 22:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7002.magru.wmnet with OS bullseye
- 22:37 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 22:32 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 22:28 reedy@deploy2002: musikanimal, reedy: Backport for Add CodeMirror to BetaFeaturesAllowList (T376735) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:26 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4800
- 22:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7010.magru.wmnet with OS bullseye
- 22:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm
- 22:24 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns7002.wikimedia.org with OS bullseye
- 22:24 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 4800
- 22:22 reedy@deploy2002: Started scap sync-world: Backport for Add CodeMirror to BetaFeaturesAllowList (T376735)
- 22:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7002.wikimedia.org with reason: host reimage
- 22:21 reedy@deploy2002: Finished scap sync-world: Backport for Nov 26 2024: Vector 2022 Deployments (T379799) (duration: 19m 52s)
- 22:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7002.wikimedia.org with reason: host reimage
- 22:15 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262979
- 22:14 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262979
- 22:11 reedy@deploy2002: jdlrobson, reedy: Continuing with sync
- 22:08 reedy@deploy2002: jdlrobson, reedy: Backport for Nov 26 2024: Vector 2022 Deployments (T379799) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7002.magru.wmnet with reason: host reimage
- 22:01 reedy@deploy2002: Started scap sync-world: Backport for Nov 26 2024: Vector 2022 Deployments (T379799)
- 22:00 reedy@deploy2002: Finished scap sync-world: Backport for Add BetaFeature for CodeMirror 6 (T376735) (duration: 40m 05s)
- 21:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7004.magru.wmnet with OS bullseye
- 21:58 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 21:58 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7002.magru.wmnet with reason: host reimage
- 21:57 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 21:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bullseye
- 21:46 reedy@deploy2002: musikanimal, reedy: Continuing with sync
- 21:44 reedy@deploy2002: musikanimal, reedy: Backport for Add BetaFeature for CodeMirror 6 (T376735) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:38 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye
- 21:35 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7002.magru.wmnet with OS bullseye
- 21:35 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7002.wikimedia.org with OS bookworm
- 21:35 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cp7002.magru.wmnet dns7002.magru.wmnet on all recursors
- 21:35 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7002.magru.wmnet dns7002.magru.wmnet on all recursors
- 21:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7010.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7010.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:32 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:32 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
- 21:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: host reimage
- 21:32 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
- 21:30 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7001
- 21:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7003.magru.wmnet with OS bullseye
- 21:30 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 21:30 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7001
- 21:30 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7010
- 21:30 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7010
- 21:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7004.magru.wmnet with reason: host reimage
- 21:28 robh@cumin2002: START - Cookbook sre.dns.netbox
- 21:28 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:26 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 21:25 robh@cumin2002: START - Cookbook sre.dns.netbox
- 21:24 damilare: civicrm upgraded from 59d340cd to 3b1ed162
- 21:23 damilare: SmashPig upgraded from 131e92a5 to 79b463b4
- 21:22 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lvs7001.magru.wmnet
- 21:22 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm
- 21:20 reedy@deploy2002: Started scap sync-world: Backport for Add BetaFeature for CodeMirror 6 (T376735)
- 21:20 robh@cumin2002: START - Cookbook sre.dns.netbox
- 21:20 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7010.magru.wmnet
- 21:20 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:20 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7010.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 21:19 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7010.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 21:17 reedy@deploy2002: Synchronized wmf-config/core-Permissions.php: T380753 (duration: 11m 23s)
- 21:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye
- 21:15 robh@cumin2002: START - Cookbook sre.dns.netbox
- 21:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye
- 21:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7004.magru.wmnet with OS bullseye
- 21:08 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7010.magru.wmnet
- 21:08 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs7001.magru.wmnet
- 21:04 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:04 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:02 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cp7003.magru.wmnet cp7004.magru.wmnet on all recursors
- 21:02 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7003.magru.wmnet cp7004.magru.wmnet on all recursors
- 21:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:01 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:01 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
- 21:01 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns7002
- 21:01 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
- 21:01 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns7002
- 21:01 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7002
- 21:01 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7002
- 20:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
- 20:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
- 20:54 robh@cumin2002: START - Cookbook sre.dns.netbox
- 20:50 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7002.magru.wmnet
- 20:50 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:47 robh@cumin2002: START - Cookbook sre.dns.netbox
- 20:47 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns7002.wikimedia.org
- 20:47 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:47 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns7002.wikimedia.org decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 20:47 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns7002.wikimedia.org decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 20:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye
- 20:43 robh@cumin2002: START - Cookbook sre.dns.netbox
- 20:39 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241126
- 20:37 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7002.magru.wmnet
- 20:37 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns7002.wikimedia.org
- 20:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7003.magru.wmnet with OS bullseye
- 20:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7004.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye
- 20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye
- 20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1025.eqiad.wmnet with OS bullseye
- 20:32 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:31 swfrench@deploy2002: Finished scap sync-world: Backport for debug.json: add support for mwdebug-next (T372605) (duration: 14m 21s)
- 20:26 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7004.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:26 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:26 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
- 20:26 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
- 20:25 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7002
- 20:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002
- 20:24 swfrench@deploy2002: swfrench: Continuing with sync
- 20:23 robh@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti7002
- 20:23 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002
- 20:23 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:23 swfrench@deploy2002: swfrench: Backport for debug.json: add support for mwdebug-next (T372605) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:22 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:21 robh@cumin2002: START - Cookbook sre.dns.netbox
- 20:21 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241126
- 20:17 swfrench@deploy2002: Started scap sync-world: Backport for debug.json: add support for mwdebug-next (T372605)
- 20:16 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7004.magru.wmnet
- 20:16 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:16 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Update
- 20:14 robh@cumin2002: START - Cookbook sre.dns.netbox
- 20:13 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti7002.magru.wmnet
- 20:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:13 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7002.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 20:13 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7002.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 20:11 hashar@deploy2002: Finished scap sync-world: Backport for Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862) (duration: 15m 23s)
- 20:09 robh@cumin2002: START - Cookbook sre.dns.netbox
- 20:08 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update
- 20:07 aokoth@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Update
- 20:07 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update
- 20:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7004.magru.wmnet
- 20:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7002.magru.wmnet
- 20:02 hashar@deploy2002: hashar: Continuing with sync
- 20:02 hashar@deploy2002: hashar: Backport for Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:00 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7003.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:59 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:55 hashar@deploy2002: Started scap sync-world: Backport for Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862)
- 19:52 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7003.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:52 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:51 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:51 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
- 19:50 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
- 19:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7001
- 19:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7001
- 19:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7003
- 19:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7003
- 19:46 robh@cumin2002: START - Cookbook sre.dns.netbox
- 19:43 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7003.magru.wmnet
- 19:43 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:43 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 19:42 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 19:33 robh@cumin2002: START - Cookbook sre.dns.netbox
- 19:27 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7003.magru.wmnet
- 19:27 urbanecm: [urbanecm@mwmaint2002 ~]$ foreachwiki userOptions.php --delete-defaults growthexperiments-homepage-variant # T379146, logging to /home/urbanecm/T379146.log
- 19:26 urbanecm: mwscript-k8s -f userOptions.php -- --wiki=enwiki --old=oldimpact --delete 'growthexperiments-homepage-variant' # T379146
- 19:23 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti7001.magru.wmnet
- 19:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:23 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 19:23 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 19:22 eileen: civicrm upgraded from eec961a3 to 59d340cd
- 19:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P71191 and previous config saved to /var/cache/conftool/dbconfig/20241126-192112-ladsgroup.json
- 19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
- 19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye
- 19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
- 19:11 robh@cumin2002: START - Cookbook sre.dns.netbox
- 19:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P71190 and previous config saved to /var/cache/conftool/dbconfig/20241126-190607-ladsgroup.json
- 18:55 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7001.magru.wmnet
- 18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P71189 and previous config saved to /var/cache/conftool/dbconfig/20241126-185101-ladsgroup.json
- 18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P71188 and previous config saved to /var/cache/conftool/dbconfig/20241126-183556-ladsgroup.json
- 18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 repool', diff saved to https://phabricator.wikimedia.org/P71187 and previous config saved to /var/cache/conftool/dbconfig/20241126-183547-ladsgroup.json
- 18:34 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db2215 gradually with 4 steps - Maint over
- 18:33 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2215 gradually with 4 steps - Maint over
- 18:25 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{wikikube-ctrl200[1-3].codfw.wmnet} and (A:wikikube-worker-codfw or A:wikikube-master-codfw)
- 18:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance
- 18:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance
- 17:58 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{wikikube-ctrl200[1-3].codfw.wmnet} and (A:wikikube-worker-codfw or A:wikikube-master-codfw)
- 17:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1313-1327].eqiad.wmnet
- 17:47 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1313-1327].eqiad.wmnet
- 17:35 claime: homer 'cr*eqiad*' commit 'T380350'
- 17:35 claime: homer 'lsw1-e7-eqiad*' commit 'T380350'
- 17:34 claime: homer 'lsw1-f6-eqiad*' commit 'T380350'
- 17:34 claime: homer 'lsw1-f5-eqiad*' commit 'T380350'
- 17:33 claime: homer 'lsw1-e5-eqiad*' commit 'T380350'
- 17:32 claime: homer 'lsw1-e6-eqiad*' commit 'T380350'
- 17:31 claime: homer 'lsw1-f7-eqiad*' commit 'T380350'
- 17:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1321.eqiad.wmnet with OS bookworm
- 17:25 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{wikikube-ctrl100[1-3].eqiad.wmnet} and (A:wikikube-worker-eqiad or A:wikikube-master-eqiad)
- 17:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1325.eqiad.wmnet with OS bookworm
- 17:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1326.eqiad.wmnet with OS bookworm
- 17:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1324.eqiad.wmnet with OS bookworm
- 17:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1327.eqiad.wmnet with OS bookworm
- 17:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1322.eqiad.wmnet with OS bookworm
- 17:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage
- 17:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage
- 17:05 ladsgroup@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
- 17:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage
- 17:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1323.eqiad.wmnet with OS bookworm
- 17:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage
- 16:59 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{wikikube-ctrl100[1-3].eqiad.wmnet} and (A:wikikube-worker-eqiad or A:wikikube-master-eqiad)
- 16:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage
- 16:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage
- 16:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage
- 16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage
- 16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage
- 16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage
- 16:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage
- 16:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage
- 16:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm
- 16:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage
- 16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:42 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1321.eqiad.wmnet with OS bookworm
- 16:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 16:41 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage
- 16:40 urbanecm: `mwscript-k8s -f userOptions.php -- --wiki=enwiki --old=control --delete 'growthexperiments-homepage-variant'` # T379146, T377631
- 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm
- 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm
- 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm
- 16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm
- 16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm
- 16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm
- 16:28 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1321.eqiad.wmnet with OS bookworm
- 16:28 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1327.eqiad.wmnet with OS bookworm
- 16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1326.eqiad.wmnet with OS bookworm
- 16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1325.eqiad.wmnet with OS bookworm
- 16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1324.eqiad.wmnet with OS bookworm
- 16:26 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1322.eqiad.wmnet with OS bookworm
- 16:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm
- 16:20 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1323.eqiad.wmnet with OS bookworm
- 15:52 moritzm: installing intel-microcode security updates
- 15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm
- 15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm
- 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm
- 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm
- 15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm
- 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm
- 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm
- 15:42 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7001.wikimedia.org with OS bookworm
- 15:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet
- 15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye
- 15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1025.eqiad.wmnet with OS bullseye
- 15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye
- 15:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet
- 15:34 moritzm: installing wireshark security updates
- 15:33 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2215 gradually with 4 steps - Maint over
- 15:33 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2215 gradually with 4 steps - Maint over
- 15:27 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:25 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 15:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
- 15:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
- 15:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
- 15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
- 15:16 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:16 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7001.wikimedia.org with reason: host reimage
- away: UTC afternoon deploys done
- 15:08 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:08 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 15:07 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7001.wikimedia.org with reason: host reimage
- 15:05 tgr@deploy2002: Finished scap sync-world: Backport for Allow simulating the SUL3 shared domain settings via env var (T380575) (duration: 26m 23s)
- 14:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance
- 14:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance
- 14:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance
- 14:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance
- 14:56 tgr@deploy2002: tgr: Continuing with sync
- 14:44 tgr@deploy2002: tgr: Backport for Allow simulating the SUL3 shared domain settings via env var (T380575) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:43 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm
- 14:43 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7001.wikimedia.org with OS bullseye
- 14:43 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
- 14:40 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
- 14:39 tgr@deploy2002: Started scap sync-world: Backport for Allow simulating the SUL3 shared domain settings via env var (T380575)
- 14:31 mlitn@deploy2002: Finished scap sync-world: Backport for Fix incorrect 'this' (duration: 12m 36s)
- 14:25 mlitn@deploy2002: mlitn: Continuing with sync
- 14:25 mlitn@deploy2002: mlitn: Backport for Fix incorrect 'this' synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
- 14:19 mlitn@deploy2002: Started scap sync-world: Backport for Fix incorrect 'this'
- 14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
- 14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye
- 14:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 14:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 14:14 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add grid view - oblivian@cumin1002"
- 14:14 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add grid view - oblivian@cumin1002
- 14:14 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add grid view - oblivian@cumin1002
- 14:13 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add grid view - oblivian@cumin1002"
- 14:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 14:09 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7001.wikimedia.org with reason: host reimage
- 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 14:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 14:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7001.wikimedia.org with reason: host reimage
- 14:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 14:01 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7015.magru.wmnet with OS bullseye
- 14:01 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
- 14:01 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
- 13:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Reclone (T379724)
- 13:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Reclone (T379724)
- 13:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Reclone (T379724)
- 13:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Reclone (T379724)
- 13:49 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs7003.magru.wmnet with OS bullseye
- 13:49 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
- 13:49 ladsgroup@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
- 13:46 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
- 13:43 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bullseye
- 13:40 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host dns7001.wikimedia.org
- 13:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7004.magru.wmnet with reason: T376737
- 13:38 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7004.magru.wmnet with reason: T376737
- 13:35 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7015.magru.wmnet with reason: host reimage
- 13:34 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7010.magru.wmnet with reason: T376737
- 13:34 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7010.magru.wmnet with reason: T376737
- 13:34 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: T376737
- 13:34 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: T376737
- 13:32 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7015.magru.wmnet with reason: host reimage
- 13:30 Emperor: swift delete wikipedia-commons-local-public.bf b/bf/Schuur_-_Nieuwerbrug_-_20164513_-_RCE.jpg T380738
- 13:29 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: T376737
- 13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: T376737
- 13:28 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7002.magru.wmnet with reason: T376737
- 13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7002.magru.wmnet with reason: T376737
- 13:28 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7002.magru.wmnet with reason: T376737
- 13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7002.magru.wmnet with reason: T376737
- 13:27 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7008.magru.wmnet with reason: T376737
- 13:27 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7008.magru.wmnet with reason: T376737
- 13:27 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7006.magru.wmnet with reason: T376737
- 13:26 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7006.magru.wmnet with reason: T376737
- 13:26 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7001.magru.wmnet with reason: T376737
- 13:26 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7001.magru.wmnet with reason: T376737
- 13:21 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host dns7001.wikimedia.org
- 13:20 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage
- 13:18 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage
- 13:15 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 13:11 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
- 13:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71185 and previous config saved to /var/cache/conftool/dbconfig/20241126-131120-arnaudb.json
- 13:07 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7001.wikimedia.org with OS bookworm
- 13:03 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm
- 12:58 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye
- 12:57 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
- 12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71183 and previous config saved to /var/cache/conftool/dbconfig/20241126-125614-arnaudb.json
- 12:53 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_esams and A:cp for 9.2.6-1wm2
- 12:53 dcaro@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
- 12:51 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_esams and A:cp for 9.2.6-1wm2
- 12:48 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye
- 12:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71182 and previous config saved to /var/cache/conftool/dbconfig/20241126-124109-arnaudb.json
- 12:30 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
- 12:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71181 and previous config saved to /var/cache/conftool/dbconfig/20241126-122622-arnaudb.json
- 12:26 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.5 refs T375664
- 12:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71180 and previous config saved to /var/cache/conftool/dbconfig/20241126-122603-arnaudb.json
- 12:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:20 robh@cumin2002: START - Cookbook sre.dns.netbox
- 12:20 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003
- 12:20 moritzm: failover Ganeti master in magru02 to ganeti7004
- 12:20 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003
- 12:19 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7015
- 12:19 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7015
- 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to plain
- 12:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to plain
- 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to plain
- 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to plain
- 12:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71179 and previous config saved to /var/cache/conftool/dbconfig/20241126-121117-arnaudb.json
- 12:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 20%: repool', diff saved to https://phabricator.wikimedia.org/P71178 and previous config saved to /var/cache/conftool/dbconfig/20241126-121058-arnaudb.json
- 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to plain
- 12:10 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump ratio of new parsercache key spec to 4 (T373037) (duration: 15m 21s)
- 12:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to plain
- 12:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to plain
- 12:07 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to plain
- 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to plain
- 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to plain
- 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to drbd
- 12:02 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:01 ladsgroup@deploy2002: ladsgroup: Backport for Bump ratio of new parsercache key spec to 4 (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71177 and previous config saved to /var/cache/conftool/dbconfig/20241126-115612-arnaudb.json
- 11:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71176 and previous config saved to /var/cache/conftool/dbconfig/20241126-115552-arnaudb.json
- 11:55 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump ratio of new parsercache key spec to 4 (T373037)
- 11:54 hashar@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.5 refs T375664 (duration: 25m 52s)
- 11:53 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_esams and A:cp for 9.2.6-1wm2
- 11:53 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_esams and A:cp for 9.2.6-1wm2
- 11:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71175 and previous config saved to /var/cache/conftool/dbconfig/20241126-114106-arnaudb.json
- 11:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71174 and previous config saved to /var/cache/conftool/dbconfig/20241126-114047-arnaudb.json
- 11:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_eqiad and A:cp for 9.2.6-1wm2
- 11:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_eqiad and A:cp for 9.2.6-1wm2
- 11:31 moritzm: remove ganeti7001 from active Ganeti nodes in magru01
- 11:28 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 11:28 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs T375664
- 11:28 moritzm: failover Ganeti master in magru01 to ganeti7003
- 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain
- 11:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71173 and previous config saved to /var/cache/conftool/dbconfig/20241126-112601-arnaudb.json
- 11:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71172 and previous config saved to /var/cache/conftool/dbconfig/20241126-112542-arnaudb.json
- 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain
- 11:25 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs T375664
- 11:25 dcaro@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain
- 11:23 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain
- 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain
- 11:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain
- 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain
- 11:12 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain
- 11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71171 and previous config saved to /var/cache/conftool/dbconfig/20241126-111056-arnaudb.json
- 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7003.magru.wmnet with reason: T376737
- 11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7003.magru.wmnet with reason: T376737
- 11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 2%: repool', diff saved to https://phabricator.wikimedia.org/P71170 and previous config saved to /var/cache/conftool/dbconfig/20241126-111036-arnaudb.json
- 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7002.magru.wmnet with reason: T376737
- 11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7002.magru.wmnet with reason: T376737
- 11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7001.magru.wmnet with reason: T376737
- 11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7001.magru.wmnet with reason: T376737
- 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain
- 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain
- 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to drbd
- 11:05 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
- 11:03 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to drbd
- 11:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to drbd
- 10:56 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to drbd
- 10:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71169 and previous config saved to /var/cache/conftool/dbconfig/20241126-105550-arnaudb.json
- 10:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: repool', diff saved to https://phabricator.wikimedia.org/P71168 and previous config saved to /var/cache/conftool/dbconfig/20241126-105531-arnaudb.json
- 10:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to drbd
- 10:47 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs T375664
- 10:46 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
- 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to drbd
- 10:42 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to drbd
- 10:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_eqiad and A:cp for 9.2.6-1wm2
- 10:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_eqiad and A:cp for 9.2.6-1wm2
- 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to drbd
- 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to drbd
- 10:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to drbd
- 10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to drbd
- 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to drbd
- 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to drbd
- 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to drbd
- 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to drbd
- 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to drbd
- 10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to drbd
- 10:02 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to drbd
- 09:57 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to drbd
- 09:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7004.magru.wmnet to cluster magru02 and group B4
- 09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7004.magru.wmnet to cluster magru02 and group B4
- 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
- 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
- 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7004.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
- 09:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
- 09:23 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7004.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:21 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet
- 09:21 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:21 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin2002"
- 09:21 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin2002"
- 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7003.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:11 jayme@cumin2002: START - Cookbook sre.dns.netbox
- 09:11 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7003.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:03 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
- 08:52 jayme@cumin2002: START - Cookbook sre.hosts.decommission for hosts kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet
- 08:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7004.magru.wmnet to cluster magru02 and group B4
- 08:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7004.magru.wmnet to cluster magru02 and group B4
- 08:49 dcausse@deploy2002: Finished deploy [airflow-dags/search@f969d75]: search: swift_upload.py moved to refinery/bin/ (duration: 00m 27s)
- 08:49 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1005-1006,1015-1016].eqiad.wmnet
- 08:48 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1005-1006,1015-1016].eqiad.wmnet
- 08:48 dcausse@deploy2002: Started deploy [airflow-dags/search@f969d75]: search: swift_upload.py moved to refinery/bin/
- 08:47 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[2005-2006,2015-2016].codfw.wmnet
- 08:46 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[2005-2006,2015-2016].codfw.wmnet
- 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
- 08:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
- 08:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
- 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
- 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
- 08:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
- 08:06 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7004
- 08:06 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7004
- 08:06 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7003
- 08:05 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7003
- 07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'manual depool commit', diff saved to https://phabricator.wikimedia.org/P71164 and previous config saved to /var/cache/conftool/dbconfig/20241126-075433-arnaudb.json
- 07:55 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) db1233 - clone on db1246
- 07:54 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db1233 - clone on db1246
- 07:36 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f48b8de2] (duration: 00m 29s)
- 07:35 joal@deploy2002: Started deploy [analytics/refinery@f48b8de] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f48b8de2]
- 07:35 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de] (thin): Regular analytics weekly train THIN [analytics/refinery@f48b8de2] (duration: 00m 35s)
- 07:34 joal@deploy2002: Started deploy [analytics/refinery@f48b8de] (thin): Regular analytics weekly train THIN [analytics/refinery@f48b8de2]
- 07:34 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de]: Regular analytics weekly train [analytics/refinery@f48b8de2] (duration: 02m 03s)
- 07:33 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "UI bugfixes - oblivian@cumin1002"
- 07:33 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: UI bugfixes - oblivian@cumin1002
- 07:33 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: UI bugfixes - oblivian@cumin1002
- 07:33 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "UI bugfixes - oblivian@cumin1002"
- 07:32 joal@deploy2002: Started deploy [analytics/refinery@f48b8de]: Regular analytics weekly train [analytics/refinery@f48b8de2]
- 03:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2215 (T380449)', diff saved to https://phabricator.wikimedia.org/P71163 and previous config saved to /var/cache/conftool/dbconfig/20241126-034040-ladsgroup.json
- 03:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance
- 03:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance
- 03:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 20:00:00 on wdqs[2018-2020,2026-2027].codfw.wmnet with reason: T376150 non-prod hosts
- 03:12 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 20:00:00 on wdqs[2018-2020,2026-2027].codfw.wmnet with reason: T376150 non-prod hosts
- 03:11 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling neither afterwards
- 03:10 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling neither afterwards
- 03:09 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling neither afterwards
- 03:07 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling neither afterwards
- 02:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance
- 02:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance
- 02:34 brett: Import libvmod-netmapper 1.9.1-1 into varnish-staging apt component
- 02:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T376150
- 02:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T376150
- 02:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2020.codfw.wmnet, repooling source-only afterwards
- 02:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2020.codfw.wmnet, repooling source-only afterwards
- 01:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet, repooling source-only afterwards
- 01:29 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm
- 01:08 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling source-only afterwards
- 01:06 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on lvs[7001-7003].magru.wmnet with reason: site is depooled, maintenance
- 01:06 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on lvs[7001-7003].magru.wmnet with reason: site is depooled, maintenance
- 01:04 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet, repooling source-only afterwards
- 01:04 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye
- 01:03 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling source-only afterwards
- 01:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T376150
- 01:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T376150
- 00:55 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
- 00:28 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 00:21 eileen: civicrm upgraded from 190ea417 to eec961a3
- 00:16 tzatziki: removing 6 files for legal compliance
- 00:00 tzatziki: removing 1 file for legal compliance
2024-11-25
- 23:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 23:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 23:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 (T380449)', diff saved to https://phabricator.wikimedia.org/P71162 and previous config saved to /var/cache/conftool/dbconfig/20241125-235547-ladsgroup.json
- 23:54 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2018.codfw.wmnet, repooling source-only afterwards
- 23:53 tzatziki: removing 1 file for legal compliance
- 23:49 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2018.codfw.wmnet, repooling source-only afterwards
- 23:44 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:42 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P71161 and previous config saved to /var/cache/conftool/dbconfig/20241125-234040-ladsgroup.json
- 23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P71160 and previous config saved to /var/cache/conftool/dbconfig/20241125-232533-ladsgroup.json
- 23:23 tzatziki: removing 2 files for legal compliance
- 23:16 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:16 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:14 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:14 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 (T380449)', diff saved to https://phabricator.wikimedia.org/P71159 and previous config saved to /var/cache/conftool/dbconfig/20241125-231026-ladsgroup.json
- 23:10 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:10 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:09 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:09 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 23:09 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye
- 23:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
- 23:01 bking@cumin1002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
- 23:01 bking@cumin1002: START - Cookbook sre.wdqs.data-transfer
- 23:01 brett@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) cp7015.magru.wmnet lvs7003.magru.wmnet cp7015.mgmt.magru.wmnet lvs7003.mgmt.magru.wmnet on all recursors
- 23:00 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7015.magru.wmnet lvs7003.magru.wmnet cp7015.mgmt.magru.wmnet lvs7003.mgmt.magru.wmnet on all recursors
- 23:00 brett@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) cp7015.magru.wmnet lvs7003.magru.wmnet on all recursors
- 23:00 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7015.magru.wmnet lvs7003.magru.wmnet on all recursors
- 22:56 brett: Import varnish-modules 0.20.0-2~deb11u1 into varnish-staging apt component
- 22:56 bking@cumin1002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:56 bking@cumin1002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:53 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:53 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2191 (T380449)', diff saved to https://phabricator.wikimedia.org/P71158 and previous config saved to /var/cache/conftool/dbconfig/20241125-224949-ladsgroup.json
- 22:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2191.codfw.wmnet with reason: Maintenance
- 22:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2191.codfw.wmnet with reason: Maintenance
- 22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T380449)', diff saved to https://phabricator.wikimedia.org/P71157 and previous config saved to /var/cache/conftool/dbconfig/20241125-224927-ladsgroup.json
- 22:48 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:48 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:46 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye
- 22:43 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:43 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:38 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:38 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:37 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- 22:37 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
- away: UTC late deploys done
- 22:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P71156 and previous config saved to /var/cache/conftool/dbconfig/20241125-223420-ladsgroup.json
- 22:34 tgr@deploy2002: Finished scap sync-world: Backport for SUL3: Sort overrides (T373737), More authentication domain overrides (T373737), Update private/readme.php to match production (duration: 12m 49s)
- 22:32 eileen: civicrm upgraded from b7bd670f to 190ea417
- 22:31 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs7003.magru.wmnet with OS bullseye
- 22:27 tgr@deploy2002: tgr: Continuing with sync
- 22:25 tgr@deploy2002: tgr: Backport for SUL3: Sort overrides (T373737), More authentication domain overrides (T373737), Update private/readme.php to match production synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:21 tgr@deploy2002: Started scap sync-world: Backport for SUL3: Sort overrides (T373737), More authentication domain overrides (T373737), Update private/readme.php to match production
- 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P71155 and previous config saved to /var/cache/conftool/dbconfig/20241125-221913-ladsgroup.json
- 22:19 tgr@deploy2002: Finished scap sync-world: Backport for Reader Survey: Increase coverage (T378660) (duration: 14m 08s)
- 22:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage
- 22:12 tgr@deploy2002: tgr, dani: Continuing with sync
- 22:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage
- 22:09 tgr@deploy2002: tgr, dani: Backport for Reader Survey: Increase coverage (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:09 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
- 22:08 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye
- 22:04 tgr@deploy2002: Started scap sync-world: Backport for Reader Survey: Increase coverage (T378660)
- 22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T380449)', diff saved to https://phabricator.wikimedia.org/P71154 and previous config saved to /var/cache/conftool/dbconfig/20241125-220406-ladsgroup.json
- 22:02 tgr@deploy2002: Finished scap sync-world: Backport for LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042) (duration: 12m 41s)
- 21:56 tgr@deploy2002: tgr, matmarex: Continuing with sync
- 21:54 tgr@deploy2002: tgr, matmarex: Backport for LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:50 tgr@deploy2002: Started scap sync-world: Backport for LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042)
- 21:49 tgr@deploy2002: Finished scap sync-world: Backport for Reader Survey: Increase coverage on enwiki (T378660) (duration: 16m 06s)
- 21:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
- 21:45 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye
- 21:42 tgr@deploy2002: tgr, dani: Continuing with sync
- 21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2131 (T380449)', diff saved to https://phabricator.wikimedia.org/P71153 and previous config saved to /var/cache/conftool/dbconfig/20241125-213904-ladsgroup.json
- 21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
- 21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
- 21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T380449)', diff saved to https://phabricator.wikimedia.org/P71152 and previous config saved to /var/cache/conftool/dbconfig/20241125-213841-ladsgroup.json
- 21:37 tgr@deploy2002: tgr, dani: Backport for Reader Survey: Increase coverage on enwiki (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:33 tgr@deploy2002: Started scap sync-world: Backport for Reader Survey: Increase coverage on enwiki (T378660)
- 21:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye
- 21:30 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs7003.magru.wmnet with OS bullseye
- 21:29 tgr@deploy2002: Finished scap sync-world: Backport for Reader Survey: Fix yes/no messages (T378660) (duration: 16m 02s)
- 21:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P71151 and previous config saved to /var/cache/conftool/dbconfig/20241125-212334-ladsgroup.json
- 21:22 tgr@deploy2002: dani, tgr: Continuing with sync
- 21:18 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing - sukhe@cumin1002"
- 21:18 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing - sukhe@cumin1002"
- 21:17 tgr@deploy2002: dani, tgr: Backport for Reader Survey: Fix yes/no messages (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:13 tgr@deploy2002: Started scap sync-world: Backport for Reader Survey: Fix yes/no messages (T378660)
- 21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P71150 and previous config saved to /var/cache/conftool/dbconfig/20241125-210827-ladsgroup.json
- 21:04 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Host reimage - brett@cumin2002 - brett@cumin2002"
- 21:04 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Host reimage - brett@cumin2002 - brett@cumin2002"
- 21:03 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase[2021-2023].codfw.wmnet
- 21:03 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:03 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[2021-2023].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
- 21:03 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[2021-2023].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
- 20:59 eevans@cumin1002: START - Cookbook sre.dns.netbox
- 20:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7008.magru.wmnet with OS bullseye
- 20:57 brett@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 20:56 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002"
- 20:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T380449)', diff saved to https://phabricator.wikimedia.org/P71149 and previous config saved to /var/cache/conftool/dbconfig/20241125-205320-ladsgroup.json
- 20:51 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts restbase[2021-2023].codfw.wmnet
- 20:51 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:51 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
- 20:50 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
- 20:45 robh@cumin2002: START - Cookbook sre.dns.netbox
- 20:45 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns7001
- 20:45 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns7001
- 20:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye
- 20:43 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs7003.magru.wmnet with OS bullseye
- 20:40 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 20:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage
- 20:31 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage
- 20:26 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye
- 20:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
- 20:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7008.magru.wmnet with reason: host reimage
- 20:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7008.magru.wmnet with reason: host reimage
- 20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2115 (T380449)', diff saved to https://phabricator.wikimedia.org/P71147 and previous config saved to /var/cache/conftool/dbconfig/20241125-200031-ladsgroup.json
- 20:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
- 20:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
- 20:00 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7004.magru.wmnet with OS bookworm
- 19:58 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7003.magru.wmnet with OS bookworm
- 19:58 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002"
- 19:56 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002"
- 19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2038.codfw.wmnet
- 19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2037.codfw.wmnet
- 19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2036.codfw.wmnet
- 19:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye
- 19:43 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye
- 19:37 ejegg: fundraising civicrm upgraded from 3311520a to b7bd670f
- 19:36 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] enwiki: Deploy Add Link to 2% of new users (T377631) (duration: 11m 59s)
- 19:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage
- 19:31 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage
- 19:29 urbanecm@deploy2002: urbanecm: Continuing with sync
- 19:28 urbanecm@deploy2002: urbanecm: Backport for [Growth] enwiki: Deploy Add Link to 2% of new users (T377631) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:24 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] enwiki: Deploy Add Link to 2% of new users (T377631)
- 19:18 swfrench@deploy2002: Finished scap sync-world: Deployment to pick up new php 8.1 base images (duration: 09m 37s)
- 19:14 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:14 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
- 19:14 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
- 19:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 19:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 19:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 (T380449)', diff saved to https://phabricator.wikimedia.org/P71144 and previous config saved to /var/cache/conftool/dbconfig/20241125-191124-ladsgroup.json
- 19:10 robh@cumin2002: START - Cookbook sre.dns.netbox
- 19:10 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003
- 19:10 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003
- 19:08 swfrench@deploy2002: Started scap sync-world: Deployment to pick up new php 8.1 base images
- 19:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye
- 19:06 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye
- 19:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7006.magru.wmnet with OS bullseye
- 19:02 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 18:59 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7003.magru.wmnet with OS bookworm
- 18:59 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
- 18:59 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:59 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
- 18:59 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
- 18:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P71143 and previous config saved to /var/cache/conftool/dbconfig/20241125-185617-ladsgroup.json
- 18:53 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003
- 18:53 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003
- 18:53 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye
- 18:52 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye
- 18:49 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{wikikube-worker[2128-2170].codfw.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-se
- 18:48 krinkle@deploy2002: Finished deploy [statsv/statsv@6678d4b]: I7a8d831817: remove unused statsvr.py (duration: 00m 09s)
- 18:48 krinkle@deploy2002: Started deploy [statsv/statsv@6678d4b]: I7a8d831817: remove unused statsvr.py
- 18:45 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7015
- 18:45 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7015
- 18:45 robh@cumin2002: START - Cookbook sre.dns.netbox
- 18:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P71142 and previous config saved to /var/cache/conftool/dbconfig/20241125-184110-ladsgroup.json
- 18:34 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye
- 18:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7006.magru.wmnet with reason: host reimage
- 18:31 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7006.magru.wmnet with reason: host reimage
- 18:28 robh@cumin2002: START - Cookbook sre.dns.netbox
- 18:28 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7015.magru.wmnet
- 18:28 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:27 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7015.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 18:27 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7015.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 (T380449)', diff saved to https://phabricator.wikimedia.org/P71141 and previous config saved to /var/cache/conftool/dbconfig/20241125-182603-ladsgroup.json
- 18:24 robh@cumin2002: START - Cookbook sre.dns.netbox
- 18:18 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7015.magru.wmnet
- 18:17 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lvs7003.magru.wmnet
- 18:17 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:17 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 18:16 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 18:13 robh@cumin2002: START - Cookbook sre.dns.netbox
- 18:08 swfrench-wmf: rebuilt php8.1 production images to pick up 8.1.31
- 18:08 urbanecm@deploy2002: Finished scap sync-world: Backport for Migrate to virtual domains (T354939), createExtensionTables: Use virtual domains for GrowthExperiments (T354939) (duration: 13m 18s)
- 18:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS bullseye
- 18:03 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7006.magru.wmnet with OS bullseye
- 18:03 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs7003.magru.wmnet
- 18:02 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:02 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
- 18:02 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
- 18:01 urbanecm@deploy2002: urbanecm: Continuing with sync
- 17:59 urbanecm@deploy2002: urbanecm: Backport for Migrate to virtual domains (T354939), createExtensionTables: Use virtual domains for GrowthExperiments (T354939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:58 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7004
- 17:58 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7004
- 17:57 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7008
- 17:57 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7008
- 17:56 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS (duration: 02m 53s)
- 17:55 robh@cumin2002: START - Cookbook sre.dns.netbox
- 17:54 urbanecm@deploy2002: Started scap sync-world: Backport for Migrate to virtual domains (T354939), createExtensionTables: Use virtual domains for GrowthExperiments (T354939)
- 17:53 ryankemper@deploy2002: Started deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS
- 17:49 ryankemper: T378260 `snapshot1016.eqiad.wmnet` => manually deleted `cirrussearch-dump-s11.[timer,service]`
- 17:49 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7001.magru.wmnet with OS bullseye
- 17:49 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
- 17:46 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
- 17:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS bullseye
- 17:41 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp7008.magru.wmnet
- 17:41 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:39 robh@cumin2002: START - Cookbook sre.dns.netbox
- 17:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti7004.magru.wmnet
- 17:39 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:39 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7004.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 17:39 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7004.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 17:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1237 (T380449)', diff saved to https://phabricator.wikimedia.org/P71140 and previous config saved to /var/cache/conftool/dbconfig/20241125-173511-ladsgroup.json
- 17:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1237.eqiad.wmnet with reason: Maintenance
- 17:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1237.eqiad.wmnet with reason: Maintenance
- 17:34 robh@cumin2002: START - Cookbook sre.dns.netbox
- 17:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7008.magru.wmnet
- 17:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7004.magru.wmnet
- 17:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:23 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
- 17:22 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
- 17:19 robh@cumin2002: START - Cookbook sre.dns.netbox
- 17:17 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7001.magru.wmnet with reason: host reimage
- 17:14 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7001.magru.wmnet with reason: host reimage
- 17:10 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7006.magru.wmnet
- 17:10 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:10 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7006.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 17:10 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7006.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 17:06 robh@cumin2002: START - Cookbook sre.dns.netbox
- 16:59 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7006.magru.wmnet
- 16:59 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti7003.magru.wmnet
- 16:59 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:58 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 16:58 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
- 16:55 robh@cumin2002: START - Cookbook sre.dns.netbox
- 16:49 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7003.magru.wmnet
- 16:47 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7001.magru.wmnet with OS bullseye
- 16:45 hashar@deploy2002: Pruned MediaWiki: 1.44.0-wmf.2 (duration: 03m 05s)
- 16:44 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 16:42 hashar@deploy2002: Installation of scap version "4.129.0" completed for 211 hosts
- 16:42 swfrench-wmf: uploaded php8.1 8.1.31-1+wmf11u1 to apt.w.o (16:25 UTC)
- 16:38 hashar@deploy2002: Installing scap version "4.129.0" for 211 hosts
- 16:27 hashar@deploy2002: Installation of scap version "4.128.0" completed for 211 hosts
- 16:27 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 16:23 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts
- 16:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 16:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 16:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T380449)', diff saved to https://phabricator.wikimedia.org/P71138 and previous config saved to /var/cache/conftool/dbconfig/20241125-161915-ladsgroup.json
- 16:05 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:05 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{wikikube-worker[1305-1312].eqiad.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-se
- 16:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:04 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_drmrs and A:cp for 9.2.6-1wm2
- 16:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P71134 and previous config saved to /var/cache/conftool/dbconfig/20241125-160408-ladsgroup.json
- 16:02 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_drmrs and A:cp for 9.2.6-1wm2
- 15:58 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:52 Lucas_WMDE: UTC afternoon backport+config window done (apologies for the temporary flood of “Use of QuickSurveys survey” deprecation warnings – should be fixed again)
- 15:52 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:49 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Reader Survey: Fix question (T378660) (duration: 13m 02s)
- 15:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P71133 and previous config saved to /var/cache/conftool/dbconfig/20241125-154901-ladsgroup.json
- 15:48 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:47 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:47 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru swaps - robh@cumin2002"
- 15:46 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru swaps - robh@cumin2002"
- 15:46 claime: homer cr*eqiad* commit 'T380027'
- 15:42 robh@cumin2002: START - Cookbook sre.dns.netbox
- 15:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dani: Continuing with sync
- 15:41 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kubernetes[1009-1014].eqiad.wmnet
- 15:41 cgoubert@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 15:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 15:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 15:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dani: Backport for Reader Survey: Fix question (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 15:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm
- 15:37 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2011.codfw.wmnet with reason: Reboot
- 15:37 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2011.codfw.wmnet with reason: Reboot
- 15:37 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2010.codfw.wmnet with reason: Reboot
- 15:37 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2010.codfw.wmnet with reason: Reboot
- 15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Reader Survey: Fix question (T378660)
- 15:36 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T380449)', diff saved to https://phabricator.wikimedia.org/P71132 and previous config saved to /var/cache/conftool/dbconfig/20241125-153354-ladsgroup.json
- 15:31 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:23 lucaswerkmeister-wmde@deploy2002: dani, lucaswerkmeister-wmde: Continuing with sync
- 15:21 lucaswerkmeister-wmde@deploy2002: dani, lucaswerkmeister-wmde: Backport for Reader Survey: Deploy on enwiki (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:19 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubernetes[1009-1014].eqiad.wmnet
- 15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
- 15:18 robh@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:17 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Reader Survey: Deploy on enwiki (T378660)
- 15:15 robh@cumin1002: START - Cookbook sre.dns.netbox
- 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
- 15:15 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for New stream config for Android Rabbit Holes feature. (T380107) (duration: 15m 45s)
- 15:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T380449)', diff saved to https://phabricator.wikimedia.org/P71131 and previous config saved to /var/cache/conftool/dbconfig/20241125-151103-ladsgroup.json
- 15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 15:08 lucaswerkmeister-wmde@deploy2002: dbrant, lucaswerkmeister-wmde: Continuing with sync
- 15:03 lucaswerkmeister-wmde@deploy2002: dbrant, lucaswerkmeister-wmde: Backport for New stream config for Android Rabbit Holes feature. (T380107) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_drmrs and A:cp for 9.2.6-1wm2
- 15:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_drmrs and A:cp for 9.2.6-1wm2
- 14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1009-1014].eqiad.wmnet
- 14:59 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for New stream config for Android Rabbit Holes feature. (T380107)
- 14:57 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Pass context to 'revreview-pending-basic' on history page (T380519), Use Contexts for Message objects in review dialog (tooltip) (T380519) (duration: 15m 35s)
- 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove vlan1107 IPv6 entries - cmooney@cumin1002"
- 14:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1009-1014].eqiad.wmnet
- 14:54 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove vlan1107 IPv6 entries - cmooney@cumin1002"
- 14:54 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm
- 14:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1309.eqiad.wmnet
- 14:52 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1309.eqiad.wmnet
- 14:52 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 14:50 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Continuing with sync
- 14:49 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_codfw and A:cp for 9.2.6-1wm2
- 14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Backport for Pass context to 'revreview-pending-basic' on history page (T380519), Use Contexts for Message objects in review dialog (tooltip) (T380519) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1310-1312].eqiad.wmnet
- 14:47 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1310-1312].eqiad.wmnet
- 14:47 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_codfw and A:cp for 9.2.6-1wm2
- 14:44 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 14:44 cmooney@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverse IPv6 includes to dns repo for vlan1107 - cmooney@cumin1002"
- 14:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverse IPv6 includes to dns repo for vlan1107 - cmooney@cumin1002"
- 14:41 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Pass context to 'revreview-pending-basic' on history page (T380519), Use Contexts for Message objects in review dialog (tooltip) (T380519)
- 14:39 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 14:26 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add tooltips - oblivian@cumin1002"
- 14:26 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips - oblivian@cumin1002
- 14:26 moritzm: prune unneeded kernels from grafana2001
- 14:26 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips - oblivian@cumin1002
- 14:26 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add tooltips - oblivian@cumin1002"
- 14:20 claime: Manually deleting wikikube-worker13[13-20].eqiad.wmnet for ip exhaustion T375845
- 14:19 claime: disable puppet and kubelet on wikikube-worker13[13-28].eqiad.wmnet for ip exhaustion T375845
- 14:12 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
- 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2045.codfw.wmnet
- 14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2046.codfw.wmnet
- 14:02 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:01 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
- 14:01 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
- 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2046.codfw.wmnet
- 13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2045.codfw.wmnet
- 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2044.codfw.wmnet
- 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2043.codfw.wmnet
- 13:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_codfw and A:cp for 9.2.6-1wm2
- 13:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_codfw and A:cp for 9.2.6-1wm2
- 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2044.codfw.wmnet
- 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2043.codfw.wmnet
- 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2042.codfw.wmnet
- 13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2041.codfw.wmnet
- 13:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 13:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 13:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1318.eqiad.wmnet with OS bookworm
- 13:46 jayme: deployed sessionstore to non-dedicated nodes - T379599
- 13:44 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
- 13:44 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
- 13:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1320.eqiad.wmnet with OS bookworm
- 13:43 jayme: cordoned kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet - T379599
- 13:42 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
- 13:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2042.codfw.wmnet
- 13:42 aborrero@cumin1002: START - Cookbook sre.dns.netbox
- 13:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2041.codfw.wmnet
- 13:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1314.eqiad.wmnet with OS bookworm
- 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd and (A:cephosd)
- 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host db1246.eqiad.wmnet
- 13:38 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
- 13:38 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
- 13:38 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
- 13:37 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@006515b]: Testing the new k8s deployment (duration: 02m 34s)
- 13:37 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@006515b]: Testing the new k8s deployment
- 13:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1319.eqiad.wmnet with OS bookworm
- 13:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1315.eqiad.wmnet with OS bookworm
- 13:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1246.eqiad.wmnet
- 13:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
- 13:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1317.eqiad.wmnet with OS bookworm
- 13:28 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{wikikube-worker[1305-1312].eqiad.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or
- 13:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage
- 13:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1316.eqiad.wmnet with OS bookworm
- 13:27 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{wikikube-worker[2128-2170].codfw.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or
- 13:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage
- 13:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1313.eqiad.wmnet with OS bookworm
- 13:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage
- 13:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1179 gradually with 4 steps - Maint over
- 13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage
- 13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: T373579, host is WIP
- 13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: T373579, host is WIP
- 13:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage
- 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage
- 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage
- 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage
- 13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7015.magru.wmnet with reason: T376737
- 13:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7015.magru.wmnet with reason: T376737
- 13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7008.magru.wmnet with reason: T376737
- 13:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7008.magru.wmnet with reason: T376737
- 13:05 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage
- 13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: T376737
- 13:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage
- 13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: T376737
- 13:04 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: T376737
- 13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: T376737
- 13:04 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs7003.magru.wmnet with reason: T376737
- 13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs7003.magru.wmnet with reason: T376737
- 13:04 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage
- 13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2005.codfw.wmnet
- 13:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage
- 13:03 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti7004.magru.wmnet with reason: T376737
- 13:03 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti7004.magru.wmnet with reason: T376737
- 13:03 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti7003.magru.wmnet with reason: T376737
- 13:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage
- 13:03 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti7003.magru.wmnet with reason: T376737
- 13:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns7001.wikimedia.org with reason: T376737
- 13:02 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
- 13:02 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns7001.wikimedia.org with reason: T376737
- 13:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage
- 13:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: T376737
- 13:02 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: T376737
- 13:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage
- 13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2004.codfw.wmnet
- 13:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 13:01 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
- 13:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage
- 13:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2005.codfw.wmnet
- 12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 12:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2003.codfw.wmnet
- 12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 12:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2003.codfw.wmnet
- 12:58 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{kubestage100[5-6].eqiad.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-maste
- 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2004.codfw.wmnet
- 12:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2003.codfw.wmnet
- 12:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2003.codfw.wmnet
- 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2002.codfw.wmnet
- 12:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2005.codfw.wmnet
- 12:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:47 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup1011.eqiad.wmnet with reason: Reboot
- 12:47 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup1011.eqiad.wmnet with reason: Reboot
- 12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2002.codfw.wmnet
- 12:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2005.codfw.wmnet
- 12:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1320.eqiad.wmnet with OS bookworm
- 12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1319.eqiad.wmnet with OS bookworm
- 12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1318.eqiad.wmnet with OS bookworm
- 12:43 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{kubestage100[5-6].eqiad.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or A:ml-ser
- 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1317.eqiad.wmnet with OS bookworm
- 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1316.eqiad.wmnet with OS bookworm
- 12:42 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup1010.eqiad.wmnet with reason: Reboot
- 12:41 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup1010.eqiad.wmnet with reason: Reboot
- 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1315.eqiad.wmnet with OS bookworm
- 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1314.eqiad.wmnet with OS bookworm
- 12:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1313.eqiad.wmnet with OS bookworm
- 12:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and not (P{cp5018.*} or P{cp5026.*}) and A:cp for 9.2.6-1wm2
- 12:32 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1179 gradually with 4 steps - Maint over
- 12:28 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd and (A:cephosd)
- 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2004.codfw.wmnet
- 12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2003.codfw.wmnet
- 12:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2004.codfw.wmnet
- 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2003.codfw.wmnet
- 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2002.codfw.wmnet
- 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2002.codfw.wmnet
- 12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2002.codfw.wmnet
- 12:06 hashar@deploy2002: Pruned MediaWiki: 1.39.0-wmf.1 (duration: 00m 40s)
- 12:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2002.codfw.wmnet
- 12:03 hashar@deploy2002: Pruned MediaWiki: 1.39.0-wmf.1 (duration: 00m 37s)
- 11:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1256.eqiad.wmnet
- 11:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1256.eqiad.wmnet
- 11:51 hashar@deploy2002: Installation of scap version "4.128.0" completed for 211 hosts
- 11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker1290.eqiad.wmnet
- 11:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker1290.eqiad.wmnet
- 11:47 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts
- 11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1179 (T380449)', diff saved to https://phabricator.wikimedia.org/P71125 and previous config saved to /var/cache/conftool/dbconfig/20241125-114651-ladsgroup.json
- 11:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 11:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 11:41 claime: homer 'cr*eqiad*' commit 'T379454'
- 11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1256.eqiad.wmnet with OS bookworm
- 11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002"
- 11:39 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002"
- 11:34 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts
- 11:24 moritzm: installing Linux 6.1.119 on Bookworm nodes
- 11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
- 11:18 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
- 11:03 fabfur@cumin1002: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=magru
- 11:02 fabfur: depooling dnsboxes @ magru for hardware swap (T376737)
- 11:02 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site magru [reason: depool magru for hw swap, T376737]
- 11:01 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site magru [reason: depool magru for hw swap, T376737]
- 11:01 fabfur: depooling magru for hardware swap (T376737)
- 10:40 hashar@deploy2002: Finished deploy [integration/docroot@d585f2b]: build: Updating cross-spawn to 7.0.6 (duration: 00m 10s)
- 10:40 hashar@deploy2002: Started deploy [integration/docroot@d585f2b]: build: Updating cross-spawn to 7.0.6
- 10:38 _joe_: deleted pyall component from reprepro
- 10:35 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and not (P{cp5018.*} or P{cp5026.*}) and A:cp for 9.2.6-1wm2
- 10:25 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not (P{cp4043.*} or P{cp4051.*}) and A:cp for 9.2.6-1wm2
- 10:17 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1005.eqiad.wmnet
- 10:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:07 jynus: extending backup1009 free filesystem
- 10:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1005.eqiad.wmnet
- 09:58 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2005.codfw.wmnet
- 09:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet
- 09:45 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2005.codfw.wmnet
- 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 09:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 09:39 moritzm: remove ganeti7003 from active Ganeti nodes in magru01 T376737
- 09:34 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet
- 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:25 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump ratio of new parsercache key spec to 6 (T373037) (duration: 11m 05s)
- 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain
- 09:18 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 09:18 ladsgroup@deploy2002: ladsgroup: Backport for Bump ratio of new parsercache key spec to 6 (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain
- 09:13 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump ratio of new parsercache key spec to 6 (T373037)
- 09:13 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
- 09:04 kostajh: UTC morning deploys done
- 09:01 kharlan@deploy2002: Finished scap sync-world: Backport for IPReputation: Enable everywhere (T360067) (duration: 15m 48s)
- 08:53 kharlan@deploy2002: kharlan: Continuing with sync
- 08:50 kharlan@deploy2002: kharlan: Backport for IPReputation: Enable everywhere (T360067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain
- 08:47 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain
- 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2179.codfw.wmnet with reason: Maintenance
- 08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2179.codfw.wmnet with reason: Maintenance
- 08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 08:46 kharlan@deploy2002: Started scap sync-world: Backport for IPReputation: Enable everywhere (T360067)
- 08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T367781)', diff saved to https://phabricator.wikimedia.org/P71123 and previous config saved to /var/cache/conftool/dbconfig/20241125-084531-arnaudb.json
- 08:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain
- 08:39 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain
- 08:39 tgr@deploy2002: Finished scap sync-world: Backport for Disable more extensions when using the shared login domain (T373737) (duration: 30m 35s)
- 08:37 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not (P{cp4043.*} or P{cp4051.*}) and A:cp for 9.2.6-1wm2
- 08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P71122 and previous config saved to /var/cache/conftool/dbconfig/20241125-083024-arnaudb.json
- 08:30 tgr@deploy2002: tgr: Continuing with sync
- 08:25 tgr@deploy2002: tgr: Backport for Disable more extensions when using the shared login domain (T373737) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:17 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 08:17 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 08:17 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 08:16 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 08:16 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 08:15 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 08:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P71121 and previous config saved to /var/cache/conftool/dbconfig/20241125-081517-arnaudb.json
- 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain
- 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain
- 08:08 tgr@deploy2002: Started scap sync-world: Backport for Disable more extensions when using the shared login domain (T373737)
- 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain
- 08:00 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain
- 08:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T367781)', diff saved to https://phabricator.wikimedia.org/P71120 and previous config saved to /var/cache/conftool/dbconfig/20241125-080010-arnaudb.json
- 07:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2240 (T367781)', diff saved to https://phabricator.wikimedia.org/P71119 and previous config saved to /var/cache/conftool/dbconfig/20241125-075758-arnaudb.json
- 07:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance
- 07:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance
- 07:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
- 07:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
- 07:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 07:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
- 07:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
- 07:47 moritzm: remove ganeti7004 from active Ganeti nodes in magru02 T376737
- 07:15 _joe_: upgrading vopsbot to 0.3.9
2024-11-23
- 12:08 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop test cluster: Restart of jvm daemons.
- 12:05 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
- 02:15 urandom: decommissioning Cassandra/restbase2023-{a,b,c} — T380236
2024-11-22
- 21:51 bking@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=wdqs-internal-scholarly,name=eqiad
- 21:37 bking@cumin2002: conftool action : set/pooled=yes; selector: name=wdqs2026.codfw.wmnet
- 21:37 bking@cumin2002: conftool action : set/pooled=yes; selector: name=wdqs2018.codfw.wmnet
- 21:33 bking@cumin2002: conftool action : set/weight=1; selector: name=wdqs2026.codfw.wmnet
- 21:33 bking@cumin2002: conftool action : set/weight=1; selector: name=wdqs2018.codfw.wmnet
- 21:25 bking@cumin2002: conftool action : set/pooled=yes:weight=1; selector: cluster=wdqs-scholarly,service=wdqs-internal-scholarly
- 21:25 bking@cumin2002: conftool action : set/pooled=yes:weight=1; selector: cluster=wdqs-main,service=wdqs-internal-main
- 20:59 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2005.codfw.wmnet
- 20:59 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2005.codfw.wmnet with OS bookworm
- 20:41 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2005.codfw.wmnet with reason: host reimage
- 20:37 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2005.codfw.wmnet with reason: host reimage
- 20:20 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2005.codfw.wmnet with OS bookworm
- 20:17 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002"
- 20:17 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002"
- 20:17 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2005.codfw.wmnet on all recursors
- 20:17 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2005.codfw.wmnet on all recursors
- 20:17 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:17 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002"
- 20:17 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002"
- 20:07 herron@cumin1002: START - Cookbook sre.dns.netbox
- 20:07 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2005.codfw.wmnet
- 19:47 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2004.codfw.wmnet
- 19:47 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2004.codfw.wmnet with OS bookworm
- 19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2045.codfw.wmnet with OS bookworm
- 19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2046.codfw.wmnet with OS bookworm
- 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2043.codfw.wmnet with OS bookworm
- 19:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:31 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2004.codfw.wmnet with reason: host reimage
- 19:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:27 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2004.codfw.wmnet with reason: host reimage
- 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2044.codfw.wmnet with OS bookworm
- 19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2045.codfw.wmnet with reason: host reimage
- 19:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2046.codfw.wmnet with reason: host reimage
- 19:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2043.codfw.wmnet with reason: host reimage
- 19:13 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2004.codfw.wmnet with OS bookworm
- 19:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002"
- 19:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002"
- 19:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2004.codfw.wmnet on all recursors
- 19:10 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2004.codfw.wmnet on all recursors
- 19:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002"
- 19:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002"
- 19:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2044.codfw.wmnet with reason: host reimage
- 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2045.codfw.wmnet with reason: host reimage
- 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2046.codfw.wmnet with reason: host reimage
- 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2043.codfw.wmnet with reason: host reimage
- 19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2044.codfw.wmnet with reason: host reimage
- 18:58 herron@cumin1002: START - Cookbook sre.dns.netbox
- 18:58 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2004.codfw.wmnet
- 18:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2042.codfw.wmnet with OS bookworm
- 18:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2043.codfw.wmnet with OS bookworm
- 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2044.codfw.wmnet with OS bookworm
- 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2045.codfw.wmnet with OS bookworm
- 18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2046.codfw.wmnet with OS bookworm
- 18:45 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2003.codfw.wmnet
- 18:45 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2003.codfw.wmnet with OS bookworm
- 18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2042.codfw.wmnet with reason: host reimage
- 18:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2042.codfw.wmnet with reason: host reimage
- 18:31 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2003.codfw.wmnet with reason: host reimage
- 18:27 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2003.codfw.wmnet with reason: host reimage
- 18:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2042.codfw.wmnet with OS bookworm
- 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:11 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2003.codfw.wmnet with OS bookworm
- 18:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002"
- 18:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002"
- 18:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2003.codfw.wmnet on all recursors
- 18:10 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2003.codfw.wmnet on all recursors
- 18:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002"
- 18:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002"
- 18:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:03 herron@cumin1002: START - Cookbook sre.dns.netbox
- 18:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2042 to codfw - jhancock@cumin2002"
- 18:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2042 to codfw - jhancock@cumin2002"
- 18:02 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2003.codfw.wmnet
- 17:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 17:41 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2002.codfw.wmnet
- 17:41 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2002.codfw.wmnet with OS bookworm
- 17:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2042
- 17:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host es2042
- 17:25 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2002.codfw.wmnet with reason: host reimage
- 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudsw1-d5-eqiad.mgmt,cloudsw1-e4-eqiad.mgmt with reason: replace optics on faulty WMCS link from D5 to E4
- 17:22 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudsw1-d5-eqiad.mgmt,cloudsw1-e4-eqiad.mgmt with reason: replace optics on faulty WMCS link from D5 to E4
- 17:22 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2002.codfw.wmnet with reason: host reimage
- 17:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:08 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2002.codfw.wmnet with OS bookworm
- 17:06 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002"
- 17:06 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002"
- 17:05 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2002.codfw.wmnet on all recursors
- 17:05 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2002.codfw.wmnet on all recursors
- 17:05 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:05 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002"
- 17:05 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002"
- 17:00 herron@cumin1002: START - Cookbook sre.dns.netbox
- 17:00 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2002.codfw.wmnet
- 16:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:54 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2003.codfw.wmnet to plain
- 16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:53 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2003.codfw.wmnet to plain
- 16:48 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2004.codfw.wmnet to plain
- 16:47 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2004.codfw.wmnet to plain
- 16:43 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2005.codfw.wmnet to plain
- 16:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2041.codfw.wmnet with OS bookworm
- 16:43 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 16:43 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 16:42 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2005.codfw.wmnet to plain
- 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:27 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2041.codfw.wmnet with reason: host reimage
- 16:24 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2041.codfw.wmnet with reason: host reimage
- 16:12 claime: homer 'cr*codfw*' commit 'T380473'
- 16:11 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts parse[2002-2020].codfw.wmnet
- 16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse[2002-2020].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
- 16:10 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse[2002-2020].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
- 16:09 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
- 16:08 bking@deploy2002: Finished deploy [wdqs/wdqs@9927a5a]: 0.3.150 (duration: 03m 00s)
- 16:07 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 16:05 bking@deploy2002: Started deploy [wdqs/wdqs@9927a5a]: 0.3.150
- 16:00 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
- 15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts parse[2002-2020].codfw.wmnet
- 15:31 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
- 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts parse2001.codfw.wmnet
- 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
- 15:29 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
- 15:29 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es2041.codfw.wmnet with OS bookworm
- 15:25 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 15:22 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
- 15:20 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts parse2001.codfw.wmnet
- 15:17 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
- 15:17 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
- 15:16 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
- 15:15 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
- 15:14 claime: kubectl delete node parse20{01..20}.codfw.wmnet - T380473
- 15:12 claime: parse[2001-2020].codfw.wmnet 'systemctl stop kubelet.service' - T380473
- 15:11 claime: parse[2001-2020].codfw.wmnet 'disable-puppet "decom"' - T380473
- 15:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host parse[2001-2020].codfw.wmnet
- 15:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wdqs[2018-2020].codfw.wmnet with reason: T379023
- 15:02 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wdqs[2018-2020].codfw.wmnet with reason: T379023
- 15:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T379023
- 15:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T379023
- 14:54 urandom: decommissioning Cassandra/restbase2022-{a,b,c} —
- 14:53 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — T380236
- 14:53 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — T380236
- 14:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host parse[2001-2020].codfw.wmnet
- 14:37 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
- 14:27 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
- 14:23 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
- 14:22 vgutierrez: restoring haproxykafka on A:cp-ulsfo and A:cp-eqsin - T380570
- 14:13 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
- 14:12 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
- 14:12 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
- 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2156-2170].codfw.wmnet
- 11:26 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2156-2170].codfw.wmnet
- 11:25 claime: homer 'lsw1-d7-codfw*' commit 'T376966'
- 11:24 claime: homer 'lsw1-d6-codfw*' commit 'T376966'
- 11:24 claime: homer 'lsw1-d5-codfw*' commit 'T376966'
- 11:23 claime: homer 'lsw1-d4-codfw*' commit 'T376966'
- 11:22 claime: homer 'lsw1-d1-codfw*' commit 'T376966'
- 11:21 claime: homer 'lsw1-c7-codfw*' commit 'T376966'
- 11:20 claime: homer 'lsw1-c4-codfw*' commit 'T376966'
- 11:19 claime: homer 'lsw1-c2-codfw*' commit 'T376966'
- 11:19 claime: homer 'lsw1-b7-codfw*' commit 'T376966'
- 11:18 claime: homer 'lsw1-b4-codfw*' commit 'T376966'
- 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2140.codfw.wmnet
- 11:07 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2140.codfw.wmnet
- 11:04 claime: homer 'lsw1-b7-codfw*' commit 'T377028'
- 11:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm
- 10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage
- 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage
- 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1014.eqiad.wmnet
- 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 10:37 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 10:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1014.eqiad.wmnet
- 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1011.eqiad.wmnet
- 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 10:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 10:22 vgutierrez: manually stopping haproxykafka on A:cp-ulsfo and A:cp-eqsin - T380570
- 10:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm
- 10:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1011.eqiad.wmnet
- 08:08 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add sorting options to tree view - oblivian@cumin1002"
- 08:08 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add sorting options to tree view - oblivian@cumin1002
- 08:07 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add sorting options to tree view - oblivian@cumin1002
- 08:07 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add sorting options to tree view - oblivian@cumin1002"
- 01:00 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2005.codfw.wmnet
- 01:00 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2005.codfw.wmnet with OS bookworm
- 00:46 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2005.codfw.wmnet with reason: host reimage
- 00:42 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2005.codfw.wmnet with reason: host reimage
- 00:27 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2005.codfw.wmnet with OS bookworm
- 00:20 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002"
- 00:20 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002"
- 00:20 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2005.codfw.wmnet on all recursors
- 00:20 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2005.codfw.wmnet on all recursors
- 00:20 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 00:20 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002"
- 00:16 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002"
- 00:11 herron@cumin1002: START - Cookbook sre.dns.netbox
- 00:11 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2005.codfw.wmnet
- 00:11 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2004.codfw.wmnet
- 00:11 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2004.codfw.wmnet with OS bookworm
2024-11-21
- 23:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2004.codfw.wmnet with reason: host reimage
- 23:52 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2004.codfw.wmnet with reason: host reimage
- 23:36 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2004.codfw.wmnet with OS bookworm
- 23:29 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002"
- 23:29 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002"
- 23:29 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2004.codfw.wmnet on all recursors
- 23:28 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2004.codfw.wmnet on all recursors
- 23:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002"
- 23:24 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002"
- 23:11 herron@cumin1002: START - Cookbook sre.dns.netbox
- 23:11 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2004.codfw.wmnet
- 23:09 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2003.codfw.wmnet
- 23:09 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2003.codfw.wmnet with OS bookworm
- 23:08 brennen: end of utc late backport & config window
- 23:07 brennen@deploy2002: Finished scap sync-world: Backport for Add statsv to charts impressions (T379833) (duration: 12m 08s)
- 23:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
- 23:01 brennen@deploy2002: bvibber, brennen: Continuing with sync
- 23:00 brennen@deploy2002: bvibber, brennen: Backport for Add statsv to charts impressions (T379833) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:55 brennen@deploy2002: Started scap sync-world: Backport for Add statsv to charts impressions (T379833)
- 22:55 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2003.codfw.wmnet with reason: host reimage
- 22:54 brennen@deploy2002: Finished scap sync-world: resuming sync for Add tracking categories for {{#chart:}} usage (T369684) after messing up a keypress (duration: 12m 35s)
- 22:52 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2003.codfw.wmnet with reason: host reimage
- 22:42 brennen@deploy2002: Started scap sync-world: resuming sync for Add tracking categories for {{#chart:}} usage (T369684) after messing up a keypress
- 22:40 brennen@deploy2002: Sync cancelled.
- 22:40 brennen@deploy2002: bvibber, brennen: Backport for Add tracking categories for {{#chart:}} usage (T369684) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:38 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2003.codfw.wmnet with OS bookworm
- 22:36 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002"
- 22:36 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002"
- 22:35 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2003.codfw.wmnet on all recursors
- 22:35 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2003.codfw.wmnet on all recursors
- 22:35 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:35 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002"
- 22:35 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002"
- 22:32 herron@cumin1002: START - Cookbook sre.dns.netbox
- 22:32 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2003.codfw.wmnet
- 22:25 brennen@deploy2002: Started scap sync-world: Backport for Add tracking categories for {{#chart:}} usage (T369684)
- 22:25 brennen@deploy2002: Finished scap sync-world: Backport for Disable various extensions when using the shared login domain (T373737) (duration: 18m 16s)
- 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
- 22:18 brennen@deploy2002: tgr, brennen: Continuing with sync
- 22:10 brennen@deploy2002: tgr, brennen: Backport for Disable various extensions when using the shared login domain (T373737) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:06 brennen@deploy2002: Started scap sync-world: Backport for Disable various extensions when using the shared login domain (T373737)
- 22:05 brennen@deploy2002: Finished scap sync-world: Backport for Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165) (duration: 10m 34s)
- 21:58 brennen@deploy2002: brennen: Continuing with sync
- 21:58 brennen@deploy2002: brennen: Backport for Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:54 brennen@deploy2002: Started scap sync-world: Backport for Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165)
- 21:51 brennen@deploy2002: Sync cancelled.
- 21:42 brennen@deploy2002: brennen, tgr, simon04: Backport for Reduce number of bucketsizes for MediaViewer (group0) (T372165), Set 'remember' central session object field when recreating (T379254 T372702), Use cookie to access central session when local session expired synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:39 brennen@deploy2002: Started scap sync-world: Backport for Reduce number of bucketsizes for MediaViewer (group0) (T372165), Set 'remember' central session object field when recreating (T379254 T372702), Use cookie to access central session when local session expired
- 21:36 brennen@deploy2002: Finished scap sync-world: Backport for Enable Skin-Codex logging (T375287) (duration: 15m 53s)
- 21:29 brennen@deploy2002: brennen, jdlrobson: Continuing with sync
- 21:26 brennen@deploy2002: brennen, jdlrobson: Backport for Enable Skin-Codex logging (T375287) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:20 brennen@deploy2002: Started scap sync-world: Backport for Enable Skin-Codex logging (T375287)
- 21:19 brennen@deploy2002: Finished scap sync-world: Backport for Enable AutoModerator on afwiki (T376597) (duration: 13m 50s)
- 21:12 brennen@deploy2002: kgraessle, brennen: Continuing with sync
- 21:10 brennen@deploy2002: kgraessle, brennen: Backport for Enable AutoModerator on afwiki (T376597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:05 brennen@deploy2002: Started scap sync-world: Backport for Enable AutoModerator on afwiki (T376597)
- 20:46 tgr
- 20:24 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet [reason: DIMM replaced, T308459]
- 20:20 sukhe: force agent on cp2038
- 19:31 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@199401a6] (duration: 03m 45s)
- 19:27 gmodena@deploy2002: Started deploy [analytics/refinery@199401a] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@199401a6]
- 19:07 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a] (thin): Ad-hoc deployment THIN [analytics/refinery@199401a6] (duration: 05m 37s)
- 19:01 gmodena@deploy2002: Started deploy [analytics/refinery@199401a] (thin): Ad-hoc deployment THIN [analytics/refinery@199401a6]
- 18:57 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a]: Ad-hoc deployment [analytics/refinery@199401a6] (duration: 14m 08s)
- 18:57 cdanis@deploy2002: Finished scap sync-world: Backport for Follow-up fix for Charts enable on commons/test2 (T379689) (duration: 11m 29s)
- 18:49 cdanis@deploy2002: cdanis, bvibber: Continuing with sync
- 18:49 cdanis@deploy2002: cdanis, bvibber: Backport for Follow-up fix for Charts enable on commons/test2 (T379689) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:45 cdanis@deploy2002: Started scap sync-world: Backport for Follow-up fix for Charts enable on commons/test2 (T379689)
- 18:43 gmodena@deploy2002: Started deploy [analytics/refinery@199401a]: Ad-hoc deployment [analytics/refinery@199401a6]
- 18:21 cdanis@deploy2002: Finished scap sync-world: Backport for Enabling Charts on commons+test2 (T379689) (duration: 14m 05s)
- 18:16 jayme@cumin2002: conftool action : set/pooled=yes; selector: name=kubestage200[34].codfw.wmnet
- 18:15 jayme@cumin2002: conftool action : set/weight=10; selector: name=kubestage200[34].codfw.wmnet
- 18:13 cdanis@deploy2002: cdanis, bvibber: Continuing with sync
- 18:12 cdanis@deploy2002: cdanis, bvibber: Backport for Enabling Charts on commons+test2 (T379689) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:10 sukhe: running puppet on A:cp to resolve failed puppet run
- 18:10 sukhe: sudo cumin -b11 'A:cp' 'run-puppet-agent
- 18:09 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp2038.codfw.wmnet with reason: DIMM replacement in progress
- 18:09 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp2038.codfw.wmnet with reason: DIMM replacement in progress
- 18:07 cdanis@deploy2002: Started scap sync-world: Backport for Enabling Charts on commons+test2 (T379689)
- 17:58 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet [reason: DIMM failure T308459]
- 17:45 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host kubestage2003.codfw.wmnet
- 17:45 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node check for host kubestage2003.codfw.wmnet
- 17:40 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts clouddb2002-dev.codfw.wmnet
- 17:40 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:40 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 17:39 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 17:39 fabfur: adding acls to kafka-jumbo cluster (T380373)
- 17:36 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 17:31 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts clouddb2002-dev.codfw.wmnet
- 17:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm
- 16:54 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet
- 16:54 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet
- 16:54 sukhe: enable puppet on lvs2013 and start pybal
- 16:48 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting
- 16:47 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting
- 16:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 16:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002"
- 16:46 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet
- 16:46 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002"
- 16:43 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet
- 16:43 sukhe: rebooting drained lvs2013
- 16:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage
- 16:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage
- 16:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage
- 16:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage
- 16:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm
- 16:20 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2157.codfw.wmnet with OS bookworm
- 16:13 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cluster=dnsbox,dc=magru [reason: testing]
- 16:08 dancy@deploy2002: Finished scap sync-world: testing (duration: 03m 01s)
- 16:05 dancy@deploy2002: Started scap sync-world: testing
- 16:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 16:03 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 16:00 dancy@deploy2002: Installing scap version "4.127.0" for 209 hosts
- 15:39 kartik@deploy2002: Finished scap sync-world: Backport for Fix layout broken by display:flex on HorizontalLayout (T380471), Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id" (duration: 15m 51s)
- 15:34 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@358ccf55] (duration: 03m 30s)
- 15:33 kartik@deploy2002: abi, sgimeno, kartik: Continuing with sync
- 15:31 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@358ccf55]
- 15:29 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5] (thin): Ad-hoc deployment THIN [analytics/refinery@358ccf55] (duration: 05m 16s)
- 15:29 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
- 15:29 kartik@deploy2002: abi, sgimeno, kartik: Backport for Fix layout broken by display:flex on HorizontalLayout (T380471), Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:28 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
- 15:28 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
- 15:27 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
- 15:26 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@6183645]: increase driver memory for mjolnir feature selection (duration: 00m 31s)
- 15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting
- 15:25 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting
- 15:25 ebernhardson@deploy2002: Started deploy [airflow-dags/search@6183645]: increase driver memory for mjolnir feature selection
- 15:24 sukhe: stop pybal on lvs2013 to confirm changes in CR 1091243
- 15:24 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5] (thin): Ad-hoc deployment THIN [analytics/refinery@358ccf55]
- 15:24 kartik@deploy2002: Started scap sync-world: Backport for Fix layout broken by display:flex on HorizontalLayout (T380471), Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id"
- 15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 15:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 15:16 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 15:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 15:11 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — T380236
- 15:10 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — T380236
- 15:06 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5]: Ad-hoc deployment [analytics/refinery@358ccf55] (duration: 11m 44s)
- 14:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm
- 14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 14:54 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5]: Ad-hoc deployment [analytics/refinery@358ccf55]
- 14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm
- 14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm
- 14:50 sergi0: UTC afternoon deploys done
- 14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm
- 14:48 sgimeno@deploy2002: Sync cancelled.
- 14:47 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm
- 14:43 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on kafka-main1001.eqiad.wmnet with reason: Per claime's recommendation
- 14:43 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on kafka-main1001.eqiad.wmnet with reason: Per claime's recommendation
- 14:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm
- 14:41 sgimeno@deploy2002: sgimeno: Backport for ExperimentUserDefaultsManager: use read latest when retrieving central id (T379682) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 14:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage
- 14:35 sgimeno@deploy2002: Started scap sync-world: Backport for ExperimentUserDefaultsManager: use read latest when retrieving central id (T379682)
- 14:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage
- 14:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage
- 14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage
- 14:25 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
- 14:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage
- 14:25 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
- 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage
- 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage
- 14:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage
- 14:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage
- 14:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage
- 14:21 sgimeno@deploy2002: Finished scap sync-world: Backport for enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332) (duration: 13m 50s)
- 14:14 sgimeno@deploy2002: eggroll97, sgimeno: Continuing with sync
- 14:11 sgimeno@deploy2002: eggroll97, sgimeno: Backport for enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:11 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1006.eqiad.wmnet with OS bookworm
- 14:07 sgimeno@deploy2002: Started scap sync-world: Backport for enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332)
- 14:06 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1005.eqiad.wmnet with OS bookworm
- 14:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm
- 14:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm
- 14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm
- 14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm
- 14:03 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm
- 13:54 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage
- 13:51 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage
- 13:47 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage
- 13:44 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage
- 13:34 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage1006.eqiad.wmnet with OS bookworm
- 13:33 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1008 to kubestage1006
- 13:32 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubestage1006
- 13:31 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubestage1006
- 13:31 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:31 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1008 to kubestage1006 - jayme@cumin2002"
- 13:30 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1008 to kubestage1006 - jayme@cumin2002"
- 13:27 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage1005.eqiad.wmnet with OS bookworm
- 13:25 jayme@cumin2002: START - Cookbook sre.dns.netbox
- 13:25 jayme@cumin2002: START - Cookbook sre.hosts.rename from kubernetes1008 to kubestage1006
- 13:24 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1007 to kubestage1005
- 13:24 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubestage1005
- 13:22 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubestage1005
- 13:22 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:22 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1007 to kubestage1005 - jayme@cumin2002"
- 13:21 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1007 to kubestage1005 - jayme@cumin2002"
- 13:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm
- 13:18 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp5026*} and A:cp for 9.2.6-1wm2
- 13:17 jayme@cumin2002: START - Cookbook sre.dns.netbox
- 13:17 jayme@cumin2002: START - Cookbook sre.hosts.rename from kubernetes1007 to kubestage1005
- 13:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm
- 13:14 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp5026*} and A:cp for 9.2.6-1wm2
- 13:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp5018*} and A:cp for 9.2.6-1wm2
- 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm
- 13:10 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp5018*} and A:cp for 9.2.6-1wm2
- 13:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm
- 13:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2163.codfw.wmnet with OS bookworm
- 13:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm
- 12:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm
- 12:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage
- 12:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm
- 12:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage
- 12:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage
- 12:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage
- 12:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage
- 12:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage
- 12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage
- 12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage
- 12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage
- 12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage
- 12:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage
- 12:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage
- 12:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage
- 12:35 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage
- 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage
- 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage
- 12:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm
- 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm
- 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
- 12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm
- 12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm
- 12:16 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm
- 12:16 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 12:13 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm
- 12:13 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm
- 12:09 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 12:09 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 12:02 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 11:56 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 11:56 jmm@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
- 11:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1005.eqiad.wmnet with OS bullseye
- 11:00 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 10:59 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 10:41 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1007-1008].eqiad.wmnet
- 10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1005.eqiad.wmnet with reason: host reimage
- 10:40 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1007-1008].eqiad.wmnet
- 10:39 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71113 and previous config saved to /var/cache/conftool/dbconfig/20241121-103834-arnaudb.json
- 10:38 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 10:38 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 10:37 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1005.eqiad.wmnet with reason: host reimage
- 10:36 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 10:34 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 10:33 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 10:25 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1005.eqiad.wmnet with OS bullseye
- 10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71112 and previous config saved to /var/cache/conftool/dbconfig/20241121-102328-arnaudb.json
- 10:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 102
- 10:19 ayounsi@cumin1002: START - Cookbook sre.network.debug for Netbox circuit ID 102
- 10:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71111 and previous config saved to /var/cache/conftool/dbconfig/20241121-100821-arnaudb.json
- 10:01 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
- 10:01 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
- 09:59 dcausse: restarting eventgate-main@codfw
- 09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71110 and previous config saved to /var/cache/conftool/dbconfig/20241121-095313-arnaudb.json
- 09:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71109 and previous config saved to /var/cache/conftool/dbconfig/20241121-095102-arnaudb.json
- 09:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 09:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 09:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 09:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 09:35 moritzm: installing nghttp2 security updates
- 09:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bookworm
- 09:17 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.4 refs T375663
- 09:07 moritzm: installing exim4 security updates
- 09:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
- 09:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
- 08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bookworm
- 08:21 kartik@deploy2002: Finished scap sync-world: Backport for Enable the Contribute menu in 4th group of Wikis (T375303) (duration: 14m 05s)
- 08:14 kartik@deploy2002: kartik: Continuing with sync
- 08:10 kartik@deploy2002: kartik: Backport for Enable the Contribute menu in 4th group of Wikis (T375303) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:06 kartik@deploy2002: Started scap sync-world: Backport for Enable the Contribute menu in 4th group of Wikis (T375303)
- 07:48 moritzm: removing ganeti1017 from active Ganeti nodes T378921
- 05:51 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 02:30 brett: Import libvmod-re2_2.0.0-2~bpo11u1 into varnish-staging apt component
- 00:45 urandom: decommissioning Cassandra/restbase2021-{a,b,c} — T380236
- 00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2023.codfw.wmnet with reason: Decommissioning — T380236
- 00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2023.codfw.wmnet with reason: Decommissioning — T380236
- 00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — T380236
- 00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — T380236
- 00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — T380236
- 00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — T380236
- 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2038.codfw.wmnet
- 00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2038.codfw.wmnet
- 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2037.codfw.wmnet
- 00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2037.codfw.wmnet
- 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2036.codfw.wmnet
- 00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2036.codfw.wmnet
- 00:15 urbanecm: [urbanecm@deploy2002 ~]$ mwscript-k8s -- extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose # T380329
2024-11-20
- 23:22 cjming: end of UTC late backport window
- 23:20 eileen: civicrm upgraded from 7c940d6f to 3311520a
- 23:17 cjming@deploy2002: Finished scap sync-world: Backport for Temporarily disable dark mode for anonymous users (T379765) (duration: 13m 06s)
- 23:10 cjming@deploy2002: jdlrobson, cjming: Continuing with sync
- 23:08 cjming@deploy2002: jdlrobson, cjming: Backport for Temporarily disable dark mode for anonymous users (T379765) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 23:04 cjming@deploy2002: Started scap sync-world: Backport for Temporarily disable dark mode for anonymous users (T379765)
- 23:03 cjming@deploy2002: Finished scap sync-world: Backport for knwiki: update portal namespace (T380366) (duration: 12m 17s)
- 22:56 cjming@deploy2002: cjming, anzx: Continuing with sync
- 22:55 cjming@deploy2002: cjming, anzx: Backport for knwiki: update portal namespace (T380366) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:52 brett: Import libvmod-querysort 0.4-3 into varnish-staging apt component
- 22:51 cjming@deploy2002: Started scap sync-world: Backport for knwiki: update portal namespace (T380366)
- 22:49 cjming@deploy2002: Finished scap sync-world: Backport for Revert "Add contact form for U4C" (duration: 14m 22s)
- 22:49 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2005.codfw.wmnet with OS bullseye
- 22:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:40 cjming@deploy2002: trainbranchbot, cjming: Continuing with sync
- 22:40 cjming@deploy2002: trainbranchbot, cjming: Backport for Revert "Add contact form for U4C" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:34 cjming@deploy2002: Started scap sync-world: Backport for Revert "Add contact form for U4C"
- 22:31 cjming@deploy2002: Sync cancelled.
- 22:28 cjming@deploy2002: nmw03, cjming: Backport for Add contact form for U4C (T379317) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:27 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
- 22:24 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
- 22:23 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:22 cjming@deploy2002: Started scap sync-world: Backport for Add contact form for U4C (T379317)
- 22:21 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:20 cjming@deploy2002: Finished scap sync-world: Backport for Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333), Bump wikimedia/parsoid to 0.21.0-a7 (T380333) (duration: 17m 11s)
- 22:18 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:13 cjming@deploy2002: arlolra, cjming: Continuing with sync
- 22:12 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2005.codfw.wmnet with OS bullseye
- 22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin2002"
- 22:09 jhathaway@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin2002"
- 22:08 cjming@deploy2002: arlolra, cjming: Backport for Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333), Bump wikimedia/parsoid to 0.21.0-a7 (T380333) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:06 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 22:03 cjming@deploy2002: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333), Bump wikimedia/parsoid to 0.21.0-a7 (T380333)
- 22:02 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:50 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:47 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
- 21:43 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
- 21:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 21:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 21:31 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye
- 21:28 cjming@deploy2002: Finished scap sync-world: Backport for [ptwiki] Enable the CampaignEvents extension (T380090) (duration: 15m 04s)
- 21:23 eileen: * civicrm upgraded from e29243f0 to 7c940d6f
- 21:20 cjming@deploy2002: cjming, albertoleoncio: Continuing with sync
- 21:19 cjming@deploy2002: cjming, albertoleoncio: Backport for [ptwiki] Enable the CampaignEvents extension (T380090) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:13 cjming@deploy2002: Started scap sync-world: Backport for [ptwiki] Enable the CampaignEvents extension (T380090)
- 21:08 dancy@deploy2002: Installing scap version "4.124.0" for 209 hosts
- 21:06 dancy@deploy2002: Installing scap version "4.124.0" for 209 hosts
- 21:05 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl2003.codfw.wmnet
- 21:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl2003.codfw.wmnet with OS bookworm
- 21:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 21:00 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:51 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl2003.codfw.wmnet with reason: host reimage
- 20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 20:48 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl2003.codfw.wmnet with reason: host reimage
- 20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 20:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
- 20:44 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 20:40 dancy@deploy2002: Installation of scap version "4.126.0" completed for 1 hosts
- 20:39 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts
- 20:32 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl2003.codfw.wmnet with OS bookworm
- 20:30 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:30 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002"
- 20:28 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002"
- 20:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl2003.codfw.wmnet on all recursors
- 20:28 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl2003.codfw.wmnet on all recursors
- 20:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002"
- 20:26 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002"
- 20:13 herron@cumin1002: START - Cookbook sre.dns.netbox
- 20:13 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl2003.codfw.wmnet
- 20:10 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts
- 20:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
- 19:52 hashar@deploy2002: Finished deploy [integration/docroot@1627206]: build: update mediawiki-codesniffer to 45.0.0 & prevent LibUp from removing a phpcs rule (duration: 00m 10s)
- 19:52 hashar@deploy2002: Started deploy [integration/docroot@1627206]: build: update mediawiki-codesniffer to 45.0.0 & prevent LibUp from removing a phpcs rule
- 19:51 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts
- 19:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 19:42 dancy@deploy2002: Installing scap version "4.126.0" for 209 hosts
- 19:35 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl2002.codfw.wmnet
- 19:35 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl2002.codfw.wmnet with OS bookworm
- 19:20 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl2002.codfw.wmnet with reason: host reimage
- 19:17 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl2002.codfw.wmnet with reason: host reimage
- 19:12 urandom: bootstrapping cassandra, restbase2038-{a,b,c} — T380236
- 19:08 inflatador: bking@krb1001 add kerberos keytab for blunderbuss https://phabricator.wikimedia.org/P71106 T371994
- 19:04 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl2002.codfw.wmnet with OS bookworm
- 19:03 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002"
- 19:03 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002"
- 19:03 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl2002.codfw.wmnet on all recursors
- 19:03 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl2002.codfw.wmnet on all recursors
- 19:03 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:03 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002"
- 19:03 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002"
- 18:58 herron@cumin1002: START - Cookbook sre.dns.netbox
- 18:58 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl2002.codfw.wmnet
- 17:32 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4] (hadoop-test): Regular analytics weekly train BIS TEST [analytics/refinery@295d5a44] (duration: 03m 36s)
- 17:28 joal@deploy2002: Started deploy [analytics/refinery@295d5a4] (hadoop-test): Regular analytics weekly train BIS TEST [analytics/refinery@295d5a44]
- 17:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:27 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 17:22 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4] (thin): Regular analytics weekly train BIS THIN [analytics/refinery@295d5a44] (duration: 05m 02s)
- 17:22 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:21 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 17:20 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:19 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 17:18 joal@deploy2002: Started deploy [analytics/refinery@295d5a4] (thin): Regular analytics weekly train BIS THIN [analytics/refinery@295d5a44]
- 17:17 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 17:16 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4]: Regular analytics weekly train BIS [analytics/refinery@295d5a44] (duration: 03m 41s)
- 17:12 joal@deploy2002: Started deploy [analytics/refinery@295d5a4]: Regular analytics weekly train BIS [analytics/refinery@295d5a44]
- 17:05 sukhe: restart tomcat on idp2004
- 17:04 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:03 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 17:02 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:01 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 17:00 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:00 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 16:43 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 16:43 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 16:43 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 16:43 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 16:43 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
- 16:42 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
- 16:40 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
- 16:39 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 16:38 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 16:37 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
- 16:36 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
- 16:35 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 16:35 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
- 16:34 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 16:28 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
- 16:26 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:25 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 16:24 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 16:23 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:22 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 16:22 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
- 16:21 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
- 16:15 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 16:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
- 15:51 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:50 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:50 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:49 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:48 dancy@deploy2002: Finished scap sync-world: no-op deployment for testing. (duration: 03m 21s)
- 15:44 dancy@deploy2002: Started scap sync-world: no-op deployment for testing.
- 15:44 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:44 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:37 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:37 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: host overworked by dumps - T368098
- 15:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: host overworked by dumps - T368098
- 15:31 jynus: starting resharding of commons backup files into new host backup2010 T376892
- 15:27 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:23 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:23 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:22 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:22 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:19 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:19 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:15 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:14 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:13 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:13 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:10 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:09 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:09 urandom: bootstrapping cassandra, restbase2037-{a,b,c} — T380236
- 15:04 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P{cephosd100[2-4].eqiad.wmnet} and (A:cephosd)
- 14:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:53 JennH: power cycling unresponsive mgmt switch in codfw: msw-c3-codfw
- 14:50 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-workers (exit_code=99) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
- 14:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:29 cdanis: T380226 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕤☕ mwscript sql.php --wiki=commonswiki --cluster=extension1 /srv/mediawiki/php-1.44.0-wmf.4/extensions/JsonConfig/sql/mysql/tables-generated.sql
- 14:25 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7007.magru.wmnet [reason: host reimaged]
- 14:24 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P{cephosd100[2-4].eqiad.wmnet} and (A:cephosd)
- 14:23 jynus: starting resharding of commons backup files into new host backup1010 T376892
- 14:23 sukhe: running homer on asw*magru*
- 14:06 jiji@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:05 jiji@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:05 jiji@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:05 jiji@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:05 jiji@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:04 jiji@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 14:04 jiji@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 14:04 jiji@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 14:04 jiji@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 14:03 jiji@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 14:03 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:03 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 14:03 jiji@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 14:03 jiji@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 14:03 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 14:02 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 14:02 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 14:02 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 13:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2136-2139,2141-2155].codfw.wmnet
- 13:55 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2136-2139,2141-2155].codfw.wmnet
- 13:53 claime: homer 'lsw1-d4-codfw*' commit 'T377028'
- 13:52 claime: homer 'lsw1-b4-codfw*' commit 'T377028'
- 13:52 claime: homer 'lsw1-d2-codfw*' commit 'T377028'
- 13:51 claime: homer 'lsw1-c2-codfw*' commit 'T377028'
- 13:50 claime: homer 'lsw1-d7-codfw*' commit 'T377028'
- 13:50 claime: homer 'lsw1-c4-codfw*' commit 'T377028'
- 13:49 claime: homer 'lsw1-d5-codfw*' commit 'T377028'
- 13:48 claime: homer 'lsw1-b7-codfw*' commit 'T377028'
- 13:47 claime: homer 'lsw1-c7-codfw*' commit 'T377028'
- 13:46 claime: homer 'lsw1-d6-codfw*' commit 'T377028'
- 13:45 claime: homer 'lsw1-b2-codfw*' commit 'T377028'
- 13:44 claime: homer 'lsw1-d1-codfw*' commit 'T377028'
- 13:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm
- 13:38 effie: putting kafka-main1006.eqiad.wmnet in production
- 13:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm
- 13:36 jiji@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad
- 13:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm
- 13:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm
- 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 13:28 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
- 13:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 13:26 jiji@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad
- 13:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm
- 13:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm
- 13:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
- 13:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7007.magru.wmnet with OS bullseye
- 13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
- 13:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
- 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
- 13:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
- 13:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
- 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
- 13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
- 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
- 13:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
- 13:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
- 13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
- 13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
- 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
- 12:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
- 12:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
- 12:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
- 12:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm
- 12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm
- 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm
- 12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm
- 12:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm
- 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm
- 12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm
- 12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm
- 12:38 sukhe: re-enable puppet on cumin2002
- 12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm
- 12:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm
- 12:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm
- 12:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm
- 12:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 12:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage
- 12:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm
- 12:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
- 12:19 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet
- 12:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
- 12:16 sukhe@cumin2002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet
- 12:16 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet
- 12:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage
- 12:14 sukhe@cumin1002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet
- 12:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
- 12:08 sukhe: disable puppet on cumin2002 to test cumin alias for A:installserver
- 12:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
- 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
- 12:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage
- 11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
- 11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
- 11:58 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
- 11:57 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
- 11:57 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage
- 11:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage
- 11:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage
- 11:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm
- 11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm
- 11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm
- 11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm
- 11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm
- 11:37 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm
- 11:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm
- 11:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
- 11:24 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
- 11:22 akosiaris: decommission cxserver endpoints /api/rest_v1/transform/html/from, /api/rest_v1/transform/word/from from RESTBase T375616
- 10:43 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P{cephosd1001.eqiad.wmnet} and (A:cephosd)
- 10:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
- 10:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
- 10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
- 10:34 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
- 10:33 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P{cephosd1001.eqiad.wmnet} and (A:cephosd)
- 10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main[1001,1006].eqiad.wmnet with reason: Hardware refresh
- 10:33 jayme: re-enabled puppet on all k8s controll planes for rollout of T380142
- 10:33 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main[1001,1006].eqiad.wmnet with reason: Hardware refresh
- 10:22 effie: removing leadership from kafka-main1001 - T363214
- 10:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:52 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.4 refs T375663
- 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:41 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 09:38 akosiaris: decommission cxserver endpoints /api/rest_v1/list/(pair|tool|languagepairs) from RESTBase T375616
- 09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:33 aklapper@deploy2002: Finished scap sync-world: Backport for EditionLookup: Update EntityLookup calls (T380304) (duration: 13m 33s)
- 09:33 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
- 09:33 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
- 09:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:27 aklapper@deploy2002: aklapper, thiemowmde: Continuing with sync
- 09:26 aklapper@deploy2002: aklapper, thiemowmde: Backport for EditionLookup: Update EntityLookup calls (T380304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to plain
- 09:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to plain
- 09:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:20 aklapper@deploy2002: Started scap sync-world: Backport for EditionLookup: Update EntityLookup calls (T380304)
- 09:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to plain
- 09:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to plain
- 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to plain
- 09:13 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to plain
- 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to plain
- 08:51 jayme: disabling puppet on all k8s controll planes for rollout of T380142
- 08:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to plain
- 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to plain
- 08:44 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to plain
- 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
- 08:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
- 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
- 08:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
- 08:18 hashar: Restarted CI Jenkins to upgrade Leastload plugin and remove the SSH server plugin
2024-11-19
- 22:50 ryankemper@deploy2002: Started deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS
- 22:00 urbanecm@deploy2002: Finished scap sync-world: Backport for Enable experimental Parsoid fragment support on labs and test wikis (T374661), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234) (duration: 20m 39s)
- 21:53 urbanecm@deploy2002: cscott, kemayo, urbanecm: Continuing with sync
- 21:45 urbanecm@deploy2002: cscott, kemayo, urbanecm: Backport for Enable experimental Parsoid fragment support on labs and test wikis (T374661), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234) synced to the testservers (https://wikitech.wikimedia.or
- 21:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
- 21:39 urbanecm@deploy2002: Started scap sync-world: Backport for Enable experimental Parsoid fragment support on labs and test wikis (T374661), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)
- 21:38 urbanecm@deploy2002: Finished scap sync-world: Backport for Promote Vector 2022 as default on 3 wikis (T379765), Separate cache key space for test & production JsonConfig data (T380320) (duration: 14m 38s)
- 21:31 urbanecm@deploy2002: bvibber, jdlrobson, urbanecm: Continuing with sync
- 21:29 urbanecm@deploy2002: bvibber, jdlrobson, urbanecm: Backport for Promote Vector 2022 as default on 3 wikis (T379765), Separate cache key space for test & production JsonConfig data (T380320) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:23 urbanecm@deploy2002: Started scap sync-world: Backport for Promote Vector 2022 as default on 3 wikis (T379765), Separate cache key space for test & production JsonConfig data (T380320)
- 21:16 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2038.codfw.wmnet with reason: Bootstrapping — T380236
- 21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2038.codfw.wmnet with reason: Bootstrapping — T380236
- 21:15 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2037.codfw.wmnet with reason: Bootstrapping — T380236
- 21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2037.codfw.wmnet with reason: Bootstrapping — T380236
- 21:15 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2036.codfw.wmnet with reason: Bootstrapping — T380236
- 21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2036.codfw.wmnet with reason: Bootstrapping — T380236
- 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
- 20:50 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:40 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:40 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:32 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye
- 20:29 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
- 20:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
- 20:24 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:10 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 20:10 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 20:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
- 20:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1183.eqiad.wmnet with OS bullseye
- 20:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 19:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet
- 19:41 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye
- 19:40 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet
- 19:34 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 19:17 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@a4d0954]: mjolnir: T379045 Increase maxResultSize (duration: 00m 26s)
- 19:16 ebernhardson@deploy2002: Started deploy [airflow-dags/search@a4d0954]: mjolnir: T379045 Increase maxResultSize
- 19:15 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
- 19:14 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye
- 19:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage
- 19:08 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
- 19:08 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye
- 19:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage
- 19:05 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 19:05 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 18:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye
- 18:53 brett: Import ncmonitor 1.3.0-1 into main apt repo
- 18:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1183.eqiad.wmnet with OS bullseye
- 18:48 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
- 18:47 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye
- 18:39 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:36 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:34 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
- 18:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:34 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye
- 18:32 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 18:32 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 18:07 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
- 17:57 brennen@deploy2002: Finished scap sync-world: Backport for Prevent ce_event_wikis query when feature flag is off (T380288) (duration: 15m 10s)
- 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1326.eqiad.wmnet with OS bookworm
- 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1327.eqiad.wmnet with OS bookworm
- 17:53 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:53 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye
- 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1325.eqiad.wmnet with OS bookworm
- 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:50 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1183.eqiad.wmnet with OS bullseye
- 17:50 brennen@deploy2002: daimona, brennen: Continuing with sync
- 17:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1323.eqiad.wmnet with OS bookworm
- 17:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:47 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker1290
- 17:47 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1290
- 17:47 brennen@deploy2002: daimona, brennen: Backport for Prevent ce_event_wikis query when feature flag is off (T380288) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:47 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1322.eqiad.wmnet with OS bookworm
- 17:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wikikube-worker1290.eqiad.wmnet with reason: being moved to new port
- 17:42 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wikikube-worker1290.eqiad.wmnet with reason: being moved to new port
- 17:42 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 17:41 brennen@deploy2002: Started scap sync-world: Backport for Prevent ce_event_wikis query when feature flag is off (T380288)
- 17:41 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1324.eqiad.wmnet with OS bookworm
- 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage
- 17:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2110.codfw.wmnet with OS bullseye
- 17:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage
- 17:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye
- 17:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage
- 17:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage
- 17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage
- 17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage
- 17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage
- 17:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage
- 17:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage
- 17:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2110.codfw.wmnet with reason: host reimage
- 17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage
- 17:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1314.eqiad.wmnet with OS bookworm
- 17:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage
- 17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage
- 17:18 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2110.codfw.wmnet with reason: host reimage
- 17:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1318.eqiad.wmnet with OS bookworm
- 17:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1319.eqiad.wmnet with OS bookworm
- 17:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm
- 17:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm
- 17:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm
- 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1320.eqiad.wmnet with OS bookworm
- 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1321.eqiad.wmnet with OS bookworm
- 17:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1316.eqiad.wmnet with OS bookworm
- 17:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm
- 17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm
- 17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm
- 17:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2110.codfw.wmnet with OS bullseye
- 17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2110']
- 17:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage
- 17:00 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2110']
- 16:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1317.eqiad.wmnet with OS bookworm
- 16:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage
- 16:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1315.eqiad.wmnet with OS bookworm
- 16:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage
- 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1313.eqiad.wmnet with OS bookworm
- 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:52 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage
- 16:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage
- 16:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage
- 16:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage
- 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage
- 16:36 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage
- 16:36 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7007.magru.wmnet
- 16:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage
- 16:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage
- 16:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage
- 16:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage
- 16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage
- 16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage
- 16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage
- 16:31 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage
- 16:30 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage
- 16:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm
- 16:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2139.codfw.wmnet with OS bookworm
- 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1319.eqiad.wmnet with OS bookworm
- 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1320.eqiad.wmnet with OS bookworm
- 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm
- 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1318.eqiad.wmnet with OS bookworm
- 16:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm
- 16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1317.eqiad.wmnet with OS bookworm
- 16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1316.eqiad.wmnet with OS bookworm
- 16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1315.eqiad.wmnet with OS bookworm
- 16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1314.eqiad.wmnet with OS bookworm
- 16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1313.eqiad.wmnet with OS bookworm
- 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm
- 16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm
- 16:07 dreamyjazz@deploy2002: Finished scap sync-world: Backport for ExperimentUserDefaultsManager: Decrease log severity to debug (T380271) (duration: 13m 16s)
- 16:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage
- 16:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm
- 16:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage
- 15:59 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 15:59 dreamyjazz@deploy2002: dreamyjazz: Backport for ExperimentUserDefaultsManager: Decrease log severity to debug (T380271) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage
- 15:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 15:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 15:53 dreamyjazz@deploy2002: Started scap sync-world: Backport for ExperimentUserDefaultsManager: Decrease log severity to debug (T380271)
- 15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage
- 15:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage
- 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage
- 15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage
- 15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage
- 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage
- 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage
- 15:45 moritzm: installing libheif security updates
- 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
- 15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
- 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm
- 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm
- 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm
- 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm
- 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm
- 15:25 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm
- 15:25 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2138.codfw.wmnet with OS bookworm
- 15:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm
- 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2142.codfw.wmnet with OS bookworm
- 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2141.codfw.wmnet with OS bookworm
- 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2137.codfw.wmnet with OS bookworm
- 15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm
- 15:15 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7007.magru.wmnet with OS bullseye
- 15:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
- 15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
- 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
- 15:06 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
- 15:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
- 15:05 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
- away: UTC afternoon deploys done
- 14:59 tgr@deploy2002: Finished scap sync-world: Backport for Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811) (duration: 14m 16s)
- 14:52 tgr@deploy2002: tgr: Continuing with sync
- 14:50 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
- 14:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
- 14:50 tgr@deploy2002: tgr: Backport for Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:49 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
- 14:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw
- 14:48 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
- 14:46 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
- 14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 14:44 tgr@deploy2002: Started scap sync-world: Backport for Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811)
- 14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm
- 14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm
- 14:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm
- 14:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm
- 14:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm
- 14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm
- 14:39 elukey: limit /v2/_catalog to internal IPs only for all Docker Registry nodes - T378618
- 14:38 kartik@deploy2002: Finished scap sync-world: Backport for Enable message group subscription feature for MediaWiki.org (T372386) (duration: 16m 21s)
- 14:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
- 14:34 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
- 14:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
- 14:33 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
- 14:31 kartik@deploy2002: kartik, abi: Continuing with sync
- 14:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
- 14:30 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
- 14:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw
- 14:28 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
- 14:28 kartik@deploy2002: kartik, abi: Backport for Enable message group subscription feature for MediaWiki.org (T372386) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
- 14:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
- 14:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
- 14:24 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
- 14:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
- 14:23 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
- 14:22 kartik@deploy2002: Started scap sync-world: Backport for Enable message group subscription feature for MediaWiki.org (T372386)
- 14:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
- 14:21 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
- 14:21 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
- 14:21 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
- 14:18 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
- 14:17 kartik@deploy2002: Finished scap sync-world: Backport for Enable the Contribute menu in 3rd group of Wikis (T375301) (duration: 15m 07s)
- 14:15 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4]: Regular analytics weekly train [analytics/refinery@295d5a44] (duration: 08m 56s)
- 14:11 kartik@deploy2002: kartik: Continuing with sync
- 14:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1290.eqiad.wmnet
- 14:10 kartik@deploy2002: kartik: Backport for Enable the Contribute menu in 3rd group of Wikis (T375301) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:10 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1290.eqiad.wmnet
- 14:07 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
- 14:06 joal@deploy2002: Started deploy [analytics/refinery@295d5a4]: Regular analytics weekly train [analytics/refinery@295d5a44]
- 14:06 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
- 14:05 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
- 14:04 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
- 14:03 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 14:02 kartik@deploy2002: Started scap sync-world: Backport for Enable the Contribute menu in 3rd group of Wikis (T375301)
- 14:02 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 14:01 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 14:01 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 13:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
- 13:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
- 13:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266098
- 13:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 266098
- 13:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 267521
- 13:07 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 267521
- 13:07 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 201838
- 13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 201838
- 13:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262979
- 13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262979
- 13:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266631
- 13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 266631
- 13:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53180
- 13:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 53180
- 13:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21574
- 13:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 21574
- 12:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 12:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
- 12:42 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
- 12:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw
- 12:40 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
- 12:38 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from eqiad to codfw
- 12:36 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
- 12:35 moritzm: removing ganeti1016 from active Ganeti nodes T378921
- 12:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
- 12:27 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
- 12:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
- 12:22 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
- 12:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
- 12:18 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
- 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
- 11:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71095 and previous config saved to /var/cache/conftool/dbconfig/20241119-114422-arnaudb.json
- 11:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
- 11:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
- 11:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71094 and previous config saved to /var/cache/conftool/dbconfig/20241119-112917-arnaudb.json
- 11:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71093 and previous config saved to /var/cache/conftool/dbconfig/20241119-111411-arnaudb.json
- 11:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
- 11:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 207947
- 11:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 207947
- 10:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71092 and previous config saved to /var/cache/conftool/dbconfig/20241119-105906-arnaudb.json
- 10:58 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
- 10:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71091 and previous config saved to /var/cache/conftool/dbconfig/20241119-104401-arnaudb.json
- 10:41 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
- 10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
- 10:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71090 and previous config saved to /var/cache/conftool/dbconfig/20241119-102855-arnaudb.json
- 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
- 10:25 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
- 10:16 moritzm: restart spamd on vrts to pick up openssl updates
- 10:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71089 and previous config saved to /var/cache/conftool/dbconfig/20241119-101350-arnaudb.json
- 10:02 moritzm: installing openssl security updates
- 10:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
- 10:00 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
- 09:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
- 09:59 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
- 09:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
- 09:58 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
- 09:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw
- 09:52 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
- 09:51 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 09:51 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 09:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw
- 09:49 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
- 09:42 fabfur: upgrade haproxy on cp-text|upload_eqsin (T379891)
- 09:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
- 09:41 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
- 09:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
- 09:39 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 09:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 09:39 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
- 09:39 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 09:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
- 09:38 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 09:35 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
- 09:33 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 09:32 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 09:19 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.4 refs T375663
- 09:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
- 09:18 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
- 08:59 urbanecm@deploy2002: Finished scap sync-world: Backport for Add + to nowiki in core-Permissions.php (T380252) (duration: 10m 17s)
- 08:54 urbanecm@deploy2002: urbanecm, jhsoby: Continuing with sync
- 08:54 urbanecm@deploy2002: urbanecm, jhsoby: Backport for Add + to nowiki in core-Permissions.php (T380252) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:49 urbanecm@deploy2002: Started scap sync-world: Backport for Add + to nowiki in core-Permissions.php (T380252)
- 08:48 urbanecm@deploy2002: Finished scap sync-world: Backport for fix tours by finishing partial variable rename (T380071), affcom contactpages: Fix Letter of intent and logo field labels (T375392), Add nowiki to commonsuploads dblist (T380252) (duration: 14m 29s)
- 08:43 urbanecm@deploy2002: ammarpad, migr, jhsoby, urbanecm: Continuing with sync
- 08:39 urbanecm@deploy2002: ammarpad, migr, jhsoby, urbanecm: Backport for fix tours by finishing partial variable rename (T380071), affcom contactpages: Fix Letter of intent and logo field labels (T375392), Add nowiki to commonsuploads dblist (T380252) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:34 urbanecm@deploy2002: Started scap sync-world: Backport for fix tours by finishing partial variable rename (T380071), affcom contactpages: Fix Letter of intent and logo field labels (T375392), Add nowiki to commonsuploads dblist (T380252)
- 08:29 urbanecm@deploy2002: Finished scap sync-world: Backport for Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460), CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150), [GrowthExperiments] Add virtual domain config (T354939) (duration: 24m 42s)
- 08:22 urbanecm@deploy2002: urbanecm, wangombe, pfischer: Continuing with sync
- 08:12 urbanecm@deploy2002: urbanecm, wangombe, pfischer: Backport for Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460), CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150), [GrowthExperiments] Add virtual domain config (T354939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:04 urbanecm@deploy2002: Started scap sync-world: Backport for Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460), CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150), [GrowthExperiments] Add virtual domain config (T354939)
- 07:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: sad
- 07:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: sad
- 07:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215 - hw maintenance
- 07:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215 - hw maintenance
- 07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
- 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
- 07:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
- 05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.1 (duration: 01m 18s)
- 04:52 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.4 refs T375663 (duration: 49m 01s)
- 04:16 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1062.eqiad.wmnet with OS bookworm
- 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.4 refs T375663
- 04:00 ejegg: fundraising civicrm upgraded from 463a12c5 to e29243f0
- 03:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
- 03:48 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
- 03:33 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bookworm
- 03:09 ejegg: payments-wiki upgraded from 459f259b to c4463536
- 02:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 02:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 02:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 02:23 ejegg: standalone (IPN listener) SmashPig upgraded from 601405dc to 131e92a5
- 02:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1018.eqiad.wmnet with reason: host reimage
- 02:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1018.eqiad.wmnet with reason: host reimage
- 01:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 01:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 01:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
- 01:51 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 01:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
- 01:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 01:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 01:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 01:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1017.eqiad.wmnet with reason: host reimage
- 01:21 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1017.eqiad.wmnet with reason: host reimage
- 01:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2006.codfw.wmnet with OS bookworm
- 01:12 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 01:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
- 01:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
- 01:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1016.eqiad.wmnet with reason: host reimage
- 00:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1016.eqiad.wmnet with reason: host reimage
- 00:54 tzatziki: removing 1 file for legal compliance
- 00:53 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bookworm
- 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2005.codfw.wmnet with OS bookworm
- 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
- 00:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
- 00:41 tzatziki: removing 1 file for legal compliance
- 00:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
- 00:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
- 00:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:18 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
- 00:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
- 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2006.codfw.wmnet with OS bookworm
- 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
- 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2004.codfw.wmnet with OS bookworm
- 00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:10 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
- 00:03 tzatziki: removing 1 file for legal compliance
- 00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2003.codfw.wmnet with OS bookworm
- 00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
2024-11-18
- 23:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 23:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
- 23:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
- 23:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2005.codfw.wmnet with OS bookworm
- 23:32 tzatziki: removing 1 file for legal compliance
- 23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
- 23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm
- 23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 23:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 23:26 tzatziki: removing 1 file for legal compliance
- 23:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
- 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2004.codfw.wmnet with OS bookworm
- 23:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
- 23:15 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
- 23:12 tzatziki: removing 2 files for legal compliance
- 23:09 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:09 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
- 23:09 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
- 23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
- 23:06 eevans@cumin1002: START - Cookbook sre.dns.netbox
- 23:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
- 23:04 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:04 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
- 23:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2003.codfw.wmnet with OS bookworm
- 23:04 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
- 23:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
- 23:01 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm
- 23:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
- 23:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
- 23:00 eevans@cumin1002: START - Cookbook sre.dns.netbox
- 22:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
- 22:57 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
- 22:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2045.codfw.wmnet with OS bookworm
- 22:55 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm
- 22:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2044.codfw.wmnet with OS bookworm
- 22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2046.codfw.wmnet with OS bookworm
- 22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2043.codfw.wmnet with OS bookworm
- 22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
- 22:52 tzatziki: removing 10 files for legal compliance
- 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm
- 22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 22:49 bking@deploy2002: Finished deploy [wdqs/wdqs@9927a5a]: 0.3.150 (duration: 11m 59s)
- 22:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
- 22:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2042.codfw.wmnet with OS bookworm
- 22:37 bking@deploy2002: Started deploy [wdqs/wdqs@9927a5a]: 0.3.150
- 22:22 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bookworm
- 22:18 urbanecm@deploy2002: Finished scap sync-world: Backport for [GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204) (duration: 09m 14s)
- 22:13 urbanecm@deploy2002: urbanecm: Continuing with sync
- 22:13 urbanecm@deploy2002: urbanecm: Backport for [GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:09 urbanecm@deploy2002: Started scap sync-world: Backport for [GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)
- 21:58 urbanecm@deploy2002: Finished scap sync-world: Backport for Use WAN cache for JsonConfig remote fetch cache (T374746), Create no-link-recommendation variant (T377787 T380204), [GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204) (duration: 12m 10s)
- 21:54 urbanecm@deploy2002: urbanecm, bvibber: Continuing with sync
- 21:52 urbanecm@deploy2002: urbanecm, bvibber: Backport for Use WAN cache for JsonConfig remote fetch cache (T374746), Create no-link-recommendation variant (T377787 T380204), [GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:48 effie: upload prometheus-mcrouter-exporter_0.4.0+git20241118-1~wmf1 - T380212
- 21:46 urbanecm@deploy2002: Started scap sync-world: Backport for Use WAN cache for JsonConfig remote fetch cache (T374746), Create no-link-recommendation variant (T377787 T380204), [GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)
- 21:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 21:36 urbanecm@deploy2002: Finished scap sync-world: Backport for Rename everything referring to "SSO domain" to use "shared domain" (T379811), Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811), Use DB name rather than server name in shared domain path prefix (T379811) (duration: 10m 54s)
- 21:31 urbanecm@deploy2002: matmarex, urbanecm: Continuing with sync
- 21:30 urbanecm@deploy2002: matmarex, urbanecm: Backport for Rename everything referring to "SSO domain" to use "shared domain" (T379811), Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811), Use DB name rather than server name in shared domain path prefix (T379811) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:29 urbanecm: Add bvibber to wmf-deployment Gerrit group (existing deployer)
- 21:26 urbanecm@deploy2002: Started scap sync-world: Backport for Rename everything referring to "SSO domain" to use "shared domain" (T379811), Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811), Use DB name rather than server name in shared domain path prefix (T379811)
- 21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
- 21:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
- 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2046.codfw.wmnet with OS bookworm
- 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2045.codfw.wmnet with OS bookworm
- 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2044.codfw.wmnet with OS bookworm
- 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2043.codfw.wmnet with OS bookworm
- 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2042.codfw.wmnet with OS bookworm
- 21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
- 21:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm
- 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2042']
- 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2042']
- 21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2041']
- 21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2041']
- 21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2041.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
- 21:01 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm
- 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:52 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
- 20:51 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:49 jhathaway: disabling auto-reboot on re-imaging for debugging
- 20:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm
- 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2041.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:39 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2041 to codfw - jhancock@cumin2002"
- 20:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2041 to codfw - jhancock@cumin2002"
- 20:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2037.codfw.wmnet with OS bullseye
- 20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2112.codfw.wmnet with OS bullseye
- 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2113.codfw.wmnet with OS bullseye
- 20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2037.codfw.wmnet with reason: host reimage
- 19:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2037.codfw.wmnet with reason: host reimage
- 19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2112.codfw.wmnet with reason: host reimage
- 19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2163.codfw.wmnet with OS bookworm
- 19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:55 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@594d3b5]: T377153 Release glent 0.3.5 (duration: 00m 27s)
- 19:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2113.codfw.wmnet with reason: host reimage
- 19:54 ebernhardson@deploy2002: Started deploy [airflow-dags/search@594d3b5]: T377153 Release glent 0.3.5
- 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2112.codfw.wmnet with reason: host reimage
- 19:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2113.codfw.wmnet with reason: host reimage
- 19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage
- 19:36 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2112.codfw.wmnet with OS bullseye
- 19:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2113.codfw.wmnet with OS bullseye
- 19:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2037.codfw.wmnet with OS bullseye
- 19:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage
- 19:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2113']
- 19:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2037']
- 19:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2113']
- 19:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2037']
- 19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:17 swfrench@deploy2002: Finished scap sync-world: Test deployment after adding mwdebug-next check command - T372604 (duration: 01m 31s)
- 19:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
- 19:15 swfrench@deploy2002: Started scap sync-world: Test deployment after adding mwdebug-next check command - T372604
- 19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 18:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 18:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 18:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 18:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 18:41 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:27 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:15 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:15 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:14 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:13 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:12 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
- 18:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:03 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:03 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 18:01 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:53 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
- 17:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:28 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@16a5867]: Deploy latest DAGs to analytics Airflow instance. T368755. (duration: 02m 10s)
- 17:25 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@16a5867]: Deploy latest DAGs to analytics Airflow instance. T368755.
- 17:24 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set DNS for new maps-test nodes - pt1979@cumin2002"
- 16:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set DNS for new maps-test nodes - pt1979@cumin2002"
- 16:50 volans: installing spicerack v8.16.2 on cumin1002
- 16:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 16:38 volans: installing spicerack v8.16.2 on cumin2002
- 16:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1305-1312].eqiad.wmnet
- 16:34 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1305-1312].eqiad.wmnet
- 16:34 volans: uploaded spicerack_8.16.2 to apt.wikimedia.org bullseye-wikimedia
- 16:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bookworm
- 16:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bookworm
- 16:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bookworm
- 16:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bookworm
- 16:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bookworm
- 16:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm
- 16:13 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet
- 16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
- 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bookworm
- 16:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bookworm
- 16:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
- 16:06 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet
- 16:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
- 16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
- 15:58 Lucas_WMDE: UTC afternoon backport+config window done
- 15:58 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Unified dashboard: Add UI for page collection recommendations (T368718) (duration: 27m 17s)
- 15:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
- 15:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
- 15:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
- 15:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
- 15:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
- 15:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
- 15:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
- 15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
- 15:49 lucaswerkmeister-wmde@deploy2002: sbisson, lucaswerkmeister-wmde: Continuing with sync
- 15:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
- 15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
- 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
- 15:45 lucaswerkmeister-wmde@deploy2002: sbisson, lucaswerkmeister-wmde: Backport for Unified dashboard: Add UI for page collection recommendations (T368718) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
- 15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bookworm
- 15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bookworm
- 15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bookworm
- 15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm
- 15:31 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Unified dashboard: Add UI for page collection recommendations (T368718)
- 15:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bookworm
- 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bookworm
- 15:27 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bookworm
- 15:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bookworm
- 15:11 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983) (duration: 08m 14s)
- 15:07 lucaswerkmeister-wmde@deploy2002: samtar, lucaswerkmeister-wmde: Continuing with sync
- 15:06 lucaswerkmeister-wmde@deploy2002: samtar, lucaswerkmeister-wmde: Backport for Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)
- 15:00 arnaudb@cumin1002: dbctl commit (dc=all): 'manual depool commit', diff saved to https://phabricator.wikimedia.org/P71077 and previous config saved to /var/cache/conftool/dbconfig/20241118-150020-arnaudb.json
- 14:59 arnaudb@cumin1002: dbctl commit (dc=all): 'manual repool commit', diff saved to https://phabricator.wikimedia.org/P71076 and previous config saved to /var/cache/conftool/dbconfig/20241118-145946-arnaudb.json
- 14:56 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2216 slowly with 10 steps - slow motion repool T380131
- 14:56 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2216 slowly with 10 steps - slow motion repool T380131
- 14:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2150 slowly with 10 steps - slow repool db2150 T380117
- 14:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1305-1312].eqiad.wmnet
- 14:28 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1305-1312].eqiad.wmnet
- 14:16 claime: running homer 'cr*-eqiad' 'T379454'
- 14:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
- 14:09 btullis@cumin1002: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
- 14:04 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
- 13:50 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 13:49 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 13:49 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 13:48 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 13:47 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 13:46 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 13:37 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 13:37 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 13:35 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 13:35 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 13:35 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 13:34 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 13:34 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 13:33 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 13:31 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 13:31 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 13:31 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 13:30 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 13:28 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 13:28 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 13:27 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
- 13:26 topranks: stopping netbox service on netbox-next test server to restore new database backup from production
- 13:25 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 13:25 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 13:20 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1018.eqiad.wmnet with OS bullseye
- 13:16 urbanecm: mwmaint2002: Run `extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php` at `testwiki` for a bunch of pages (P71064 is list of commands executed; T378983)
- 13:04 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 13:03 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 13:01 moritzm: removing ganeti1021 from active Ganeti nodes T378921
- 12:56 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage
- 12:54 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage
- 12:39 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye
- 12:38 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1018.eqiad.wmnet with OS bullseye
- 12:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:37 kart_: Updated recommendation api to 2024-11-13-183159-production (T379592, T379037)
- 12:36 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2150 slowly with 10 steps - slow repool db2150 T380117
- 12:36 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 12:24 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:22 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye
- 12:22 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:21 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1018.eqiad.wmnet with OS bullseye
- 12:19 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 12:15 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 12:14 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 12:13 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-ulsfo
- 12:13 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:10 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 12:09 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 12:08 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye
- 12:02 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 12:00 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:59 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:59 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
- 11:45 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:41 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2216.codfw.wmnet with reason: T380131 - table corruption
- 11:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2216.codfw.wmnet with reason: T380131 - table corruption
- 11:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:41 urbanecm: mwmaint2002: Run `extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php` at `testwiki` for a bunch of pages (P71064 is list of commands executed; T378983)
- 11:33 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 11:25 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:16 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:49 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:46 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 10:46 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 10:45 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:43 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:41 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 10:41 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 10:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:27 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:14 fabfur: upgrade haproxy on cp-ulsfo (T379891)
- 10:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:14 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-ulsfo
- 10:13 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:47 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:47 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 09:42 moritzm: restarting nginx on acmechief hosts to pick up openssl updates
- 09:24 moritzm: installing openssl security updates
- 09:18 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:17 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:57 kartik@deploy2002: Finished scap sync-world: Backport for Enable the Contribute menu in 2nd group of Wikis (T375300) (duration: 11m 45s)
- 08:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40850
- 08:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 40850
- 08:53 kartik@deploy2002: kartik: Continuing with sync
- 08:49 kartik@deploy2002: kartik: Backport for Enable the Contribute menu in 2nd group of Wikis (T375300) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:45 kartik@deploy2002: Started scap sync-world: Backport for Enable the Contribute menu in 2nd group of Wikis (T375300)
- 08:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on registry1004.eqiad.wmnet with reason: testing
- 08:44 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on registry1004.eqiad.wmnet with reason: testing
- 08:43 kartik@deploy2002: Finished scap sync-world: Backport for bjnwikiquote: Add local logo (T375054) (duration: 22m 55s)
- 08:31 kartik@deploy2002: kartik, hamishz: Continuing with sync
- 08:30 kartik@deploy2002: kartik, hamishz: Backport for bjnwikiquote: Add local logo (T375054) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:20 kartik@deploy2002: Started scap sync-world: Backport for bjnwikiquote: Add local logo (T375054)
- 08:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
- 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
- 08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
- 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
- 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
- 08:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
- 07:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
- 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
- 07:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
- 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
- 07:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
- 07:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 07:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 07:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
- 07:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
- 06:31 kart_: Updated MinT to 2024-10-16-065051-production on eqiad
- 06:28 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 06:19 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
2024-11-17
- 16:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2216.codfw.wmnet with reason: Sad
- 16:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2216.codfw.wmnet with reason: Sad
- 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2216 sad', diff saved to https://phabricator.wikimedia.org/P71059 and previous config saved to /var/cache/conftool/dbconfig/20241117-163522-ladsgroup.json
2024-11-16
- 20:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 18:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 18:06 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:05 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 18:01 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:59 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 17:59 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 17:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 17:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:53 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1313.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:52 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 17:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 17:45 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 17:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1323.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 17:09 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 17:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1313.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:05 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 17:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1326.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1321.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1324.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1322.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1320.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1325.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1319.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1316.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1318.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1315.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1317.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1314.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1326.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1323.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1324.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1322.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1321.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1320.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1325.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1318.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1317.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1316.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1315.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1314.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1319.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:30 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 16:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 16:27 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 00:44 tzatziki: removing 103 files for legal compliance
2024-11-15
- 23:42 tzatziki: removing 1 file for legal compliance
- 23:19 tzatziki: removing 3 files for legal compliance
- 22:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2112.codfw.wmnet with OS bullseye
- 21:59 Dreamy_Jazz: Started MediaModeration scan on all wikis other than commonswiki attempting to scan all failed to be scanned images - https://wikitech.wikimedia.org/wiki/MediaModeration
- 21:59 Dreamy_Jazz: Started MediaModeration scan on commons wiki attempting to scan all failed to be scanned images - https://wikitech.wikimedia.org/wiki/MediaModeration
- 21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2115.codfw.wmnet with OS bullseye
- 21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2114.codfw.wmnet with OS bullseye
- 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2111.codfw.wmnet with OS bullseye
- 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2115.codfw.wmnet with reason: host reimage
- 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2038.codfw.wmnet with OS bullseye
- 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2114.codfw.wmnet with reason: host reimage
- 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2036.codfw.wmnet with OS bullseye
- 21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2111.codfw.wmnet with reason: host reimage
- 21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2115.codfw.wmnet with reason: host reimage
- 21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2114.codfw.wmnet with reason: host reimage
- 21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2111.codfw.wmnet with reason: host reimage
- 21:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2115.codfw.wmnet with OS bullseye
- 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2114.codfw.wmnet with OS bullseye
- 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2112.codfw.wmnet with OS bullseye
- 21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2111.codfw.wmnet with OS bullseye
- 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2038.codfw.wmnet with reason: host reimage
- 21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2115']
- 21:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2115']
- 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2114']
- 21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2114']
- 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2112']
- 21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2112']
- 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2111']
- 21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2111']
- 21:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2110']
- 21:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2036.codfw.wmnet with reason: host reimage
- 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2114.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2111.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2038.codfw.wmnet with reason: host reimage
- 21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2115.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2112.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2036.codfw.wmnet with reason: host reimage
- 21:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2115.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2114.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2112.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2111.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding elastic2110 to codfw - jhancock@cumin2002"
- 20:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding elastic2110 to codfw - jhancock@cumin2002"
- 20:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2038.codfw.wmnet with OS bullseye
- 20:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2036.codfw.wmnet with OS bullseye
- 20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2036']
- 20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2038']
- 20:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2038']
- 20:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2036']
- 20:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2038.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2036.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:41 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host restbase2037
- 20:40 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host restbase2037
- 20:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2038.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2036.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding restbase2036 to codfw - jhancock@cumin2002"
- 20:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding restbase2036 to codfw - jhancock@cumin2002"
- 20:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 19:54 dancy@deploy2002: Finished scap sync-world: Testing T377883 (duration: 03m 06s)
- 19:51 dancy@deploy2002: Started scap sync-world: Testing T377883
- 19:50 dancy@deploy2002: Installation of scap version "4.124.0" completed for 206 hosts
- 19:46 dancy@deploy2002: Installing scap version "4.124.0" for 206 hosts
- 18:53 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 18:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 18:35 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 18:34 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 18:32 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 18:31 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 18:15 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:15 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:09 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 18:08 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:58 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@82083c4]: image suggestions hotfix - section titles denylist dependency (duration: 01m 58s)
- 16:57 taavi: copy python3-flask-{keystone,oslolog} from bullseye-wikimedia to bookworm-wikimedia
- 16:56 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@82083c4]: image suggestions hotfix - section titles denylist dependency
- 16:27 herron@cumin2002: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1005.eqiad.wmnet,cluster=aux-k8s,service=kubesvc
- 16:27 herron@cumin2002: conftool action : set/weight=10; selector: name=aux-k8s-worker1005.eqiad.wmnet,cluster=aux-k8s,service=kubesvc
- 16:22 herron@cumin2002: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1004.eqiad.wmnet,cluster=aux-k8s,service=kubesvc
- 16:22 herron@cumin2002: conftool action : set/weight=10; selector: name=aux-k8s-worker1004.eqiad.wmnet,cluster=aux-k8s,service=kubesvc
- 16:09 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet [reason: ATS fixed]
- 16:08 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp4043.ulsfo.wmnet
- 16:08 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp4043.ulsfo.wmnet
- 16:06 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4051*} and A:cp for 9.2.6-1wm2
- 16:03 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4051*} and A:cp for 9.2.6-1wm2
- 16:00 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.6-1wm2_amd64.changes: T379797
- 15:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2230.codfw.wmnet,db1125.eqiad.wmnet with reason: testing stuff on test-s4
- 15:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2230.codfw.wmnet,db1125.eqiad.wmnet with reason: testing stuff on test-s4
- 15:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
- 15:41 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
- 15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
- 15:39 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
- 15:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
- 15:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 15:38 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
- 15:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 15:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove e8 lo0 IP - ayounsi@cumin1002"
- 13:59 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove e8 lo0 IP - ayounsi@cumin1002"
- 13:55 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 13:55 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 13:52 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 13:41 XioNoX: test no-passwords on mr1-eqsin - T379464
- 13:31 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts sretest1004.eqiad.wmnet
- 13:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
- 13:31 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
- 13:27 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 13:24 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002
- 13:23 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts sretest1004.eqiad.wmnet
- 13:21 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002
- 13:19 cmooney@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002
- 13:17 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002
- 13:01 moritzm: imported 8u432-b06-2~deb12u1 to component/jdk8 for bookworm (forward port of the latest Java 8 security fixes for Bookworm)
- 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host build2002.codfw.wmnet
- 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host build2002.codfw.wmnet with OS bookworm
- 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on build2002.codfw.wmnet with reason: host reimage
- 12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on build2002.codfw.wmnet with reason: host reimage
- 12:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics: apply
- 12:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics: apply
- 12:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics: apply
- 12:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host build2002.codfw.wmnet with OS bookworm
- 12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM build2002.codfw.wmnet - jmm@cumin2002"
- 12:15 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM build2002.codfw.wmnet - jmm@cumin2002"
- 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) build2002.codfw.wmnet on all recursors
- 12:15 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache build2002.codfw.wmnet on all recursors
- 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM build2002.codfw.wmnet - jmm@cumin2002"
- 12:11 cmooney@cumin1002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox
- 12:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM build2002.codfw.wmnet - jmm@cumin2002"
- 12:08 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Update
- 12:03 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host build2002.codfw.wmnet
- 12:01 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 12:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
- 12:01 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
- 12:00 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 11:58 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 11:38 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@2c533d6]: hotfix image suggestions weekly snapshots (duration: 00m 57s)
- 11:37 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@2c533d6]: hotfix image suggestions weekly snapshots
- 11:27 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1305-1312].eqiad.wmnet
- 11:24 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1305-1312].eqiad.wmnet
- 11:22 claime: homer 'lsw1-f5-eqiad*' commit 'T377022'
- 11:22 claime: homer 'lsw1-f6-eqiad*' commit 'T377022'
- 11:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:21 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:21 claime: homer 'lsw1-f7-eqiad*' commit 'T377022'
- 11:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 11:20 claime: homer 'lsw1-e7-eqiad*' commit 'T377022'
- 11:20 claime: homer 'lsw1-e6-eqiad*' commit 'T377022'
- 11:19 claime: homer 'lsw1-e5-eqiad*' commit 'T377022'
- 11:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:12 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:12 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:06 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:05 claime: homer 'cr*eqiad*' commit 'T377022'
- 10:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
- 09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
- 09:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:28 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:28 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:23 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:22 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:15 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Update
- 08:48 moritzm: installing Linux 6.1.115 kernel updates from Bookworm point release
- 04:54 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 12:00:00 on db1246.eqiad.wmnet with reason: depooled
- 04:54 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 12:00:00 on db1246.eqiad.wmnet with reason: depooled
- 04:51 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on db1246.eqiad.wmnet with reason: depooled
- 04:50 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on db1246.eqiad.wmnet with reason: depooled
- 04:47 rzl@cumin2002: dbctl commit (dc=all): 'db1246 depooled', diff saved to https://phabricator.wikimedia.org/P71052 and previous config saved to /var/cache/conftool/dbconfig/20241115-044705-rzl.json
- 03:44 ejegg: fundraising python tools upgraded from c6e2dbcc to b230f718
2024-11-14
- 23:17 eileen: civicrm upgraded from 2a53f697 to d49a064d
- 22:59 eileen: civicrm upgraded from 2ab8334a to 2a53f697
- 22:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp4043.ulsfo.wmnet with reason: ATS upgrade 9.2.6
- 22:37 brett@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp4043.ulsfo.wmnet with reason: ATS upgrade 9.2.6
- 22:30 ryankemper: T376150 Depooled `wdqs20[18-20]` in preparation of merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1088185
- 21:49 aqu@deploy2002: Finished deploy [airflow-dags/analytics@7a66849]: Stage Refine: fix Airflow skip (duration: 00m 59s)
- 21:48 aqu@deploy2002: Started deploy [airflow-dags/analytics@7a66849]: Stage Refine: fix Airflow skip
- 21:47 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@7a66849]: Stage Refine: fix Airflow skip (duration: 00m 14s)
- 21:47 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@7a66849]: Stage Refine: fix Airflow skip
- 21:26 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2220747]: Stage Refine test fix (duration: 00m 16s)
- 21:26 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2220747]: Stage Refine test fix
- 21:20 cjming: end of UTC late backport window
- 21:17 cjming@deploy2002: Finished scap sync-world: Backport for Redirect to wikis using subpages rather than namespaces too (T376923) (duration: 13m 44s)
- 21:13 cjming@deploy2002: cjming, pppery: Continuing with sync
- 21:08 cjming@deploy2002: cjming, pppery: Backport for Redirect to wikis using subpages rather than namespaces too (T376923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:04 cjming@deploy2002: Started scap sync-world: Backport for Redirect to wikis using subpages rather than namespaces too (T376923)
- 20:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2139.codfw.wmnet with OS bookworm
- 20:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:38 bvibber@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 20:37 bvibber@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 20:37 bvibber@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 20:36 bvibber@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 20:35 bvibber@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 20:35 bvibber@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 20:29 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
- 20:28 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter
- 20:24 bvibber@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 20:24 bvibber@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 20:24 bvibber@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 20:24 bvibber@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 20:23 bvibber@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 20:23 bvibber@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 20:23 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Network maintenance complete - None
- 20:01 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Network maintenance complete - None
- 19:55 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.3 refs T375662
- 19:40 eileen: tools upgraded from 68f64e43 to c6e2dbcc
- 19:37 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: junos upgrade done, T364092]
- 19:37 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: junos upgrade done, T364092]
- 19:20 James_F: Running `mwscript-k8s -f -- extensions/WikiLambda/maintenance/updateSecondaryTables.php --wiki=wikifunctionswiki --zType Z8 --report --verbose` for T375972, T367005, T373038, T358737
- 19:19 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox
- 19:14 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
- 19:14 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter
- 19:14 swfrench-wmf: running sre.discovery.datacenter status all to test deployed fix
- 19:00 brennen: 1.44.0-wmf.3 train status (T375662): no current blockers, but holding for network maintenance.
- 18:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bullseye
- 18:19 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
- 18:18 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter
- 18:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bullseye
- 18:13 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging
- 18:13 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging
- 18:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bullseye
- 18:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bullseye
- 18:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1190 gradually with 4 steps - Maint over
- 18:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bullseye
- 18:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
- 17:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bullseye
- 17:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
- 17:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage
- 17:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bullseye
- 17:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
- 17:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
- 17:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
- 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage
- 17:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
- 17:43 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
- 17:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
- 17:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
- 17:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
- 17:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
- 17:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
- 17:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
- 17:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
- 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm
- 17:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bullseye
- 17:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bullseye
- 17:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bullseye
- 17:24 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
- 17:24 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
- 17:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bullseye
- 17:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bullseye
- 17:19 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1190 gradually with 4 steps - Maint over
- 17:18 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: Network maintenance - None
- 17:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bullseye
- 17:15 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=4043.ulsfo.wmnet
- 17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:13 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 17:13 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 17:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bullseye
- 16:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bullseye
- 16:57 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: Network maintenance - None
- 16:52 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@7c4873e]: decouple article-level image suggestions from section-level ones (duration: 00m 53s)
- 16:51 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@7c4873e]: decouple article-level image suggestions from section-level ones
- 16:45 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
- 16:45 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
- 16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
- 16:38 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
- 16:37 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter
- 16:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
- 16:36 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
- 16:36 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter
- 16:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad
- 16:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad
- 16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1190 sad', diff saved to https://phabricator.wikimedia.org/P71044 and previous config saved to /var/cache/conftool/dbconfig/20241114-163317-ladsgroup.json
- 16:31 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 16:31 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 16:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bullseye
- 16:04 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 151575
- 16:03 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 151575
- 16:01 papaul: ongoing maintenance on cr1-eqiad
- 16:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade
- 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade
- 15:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging
- 15:56 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging
- 15:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade
- 15:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade
- 15:49 moritzm: installing nss security updates
- 15:48 reedy@deploy2002: Synchronized wmf-config/CommonSettings.php: T379834 (duration: 08m 02s)
- 15:47 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet
- 15:47 sukhe@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=97) Rolling upgrade/restart of Apache Traffic Server on P{cp4043*,cp4051*} and A:cp for 9.2.6-1wm1
- 15:45 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2002.codfw.wmnet
- 15:45 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2002.codfw.wmnet
- 15:45 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2002.codfw.wmnet
- 15:45 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2002.codfw.wmnet
- 15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 15:43 pt1979@cumin2002: START - Cookbook sre.network.cf
- 15:42 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4043*,cp4051*} and A:cp for 9.2.6-1wm1
- 15:40 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
- 15:39 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1020.eqiad.wmnet with OS bullseye
- 15:37 volans: installed spicerack v8.16.1 to cumin hosts
- 15:36 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: junos upgrade, T364092]
- 15:36 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: junos upgrade, T364092]
- 15:35 ladsgroup@deploy2002: Finished scap sync-world: Backport for Revert "mmv.js: Store comingFromHashChange as a class property" (T379835) (duration: 12m 10s)
- 15:33 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.6-1wm1_amd64.changes: T379797
- 15:30 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox
- 15:29 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: T379719
- 15:29 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: T379719
- 15:28 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2002.codfw.wmnet
- 15:28 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2002.codfw.wmnet
- 15:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 15:27 ladsgroup@deploy2002: ladsgroup: Backport for Revert "mmv.js: Store comingFromHashChange as a class property" (T379835) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:24 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:24 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and not A:magru and A:dnsbox
- 15:23 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)
- 15:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 15:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 15:07 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:07 sergi0: UTC afternoon deploys done
- 15:06 sgimeno@deploy2002: Finished scap sync-world: Backport for HomepageHooks: run metrics increment in deferred update (T379682) (duration: 11m 15s)
- 15:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:02 sgimeno@deploy2002: sgimeno: Continuing with sync
- 14:59 sgimeno@deploy2002: sgimeno: Backport for HomepageHooks: run metrics increment in deferred update (T379682) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:55 sgimeno@deploy2002: Started scap sync-world: Backport for HomepageHooks: run metrics increment in deferred update (T379682)
- 14:53 volans: uploaded spicerack_8.16.1 to apt.wikimedia.org bullseye-wikimedia
- 14:50 sgimeno@deploy2002: Finished scap sync-world: Backport for GrowthExperiments: set experiment config only in pilot wikis (T379681) (duration: 13m 02s)
- 14:45 sgimeno@deploy2002: sgimeno: Continuing with sync
- 14:41 sgimeno@deploy2002: sgimeno: Backport for GrowthExperiments: set experiment config only in pilot wikis (T379681) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:37 sgimeno@deploy2002: Started scap sync-world: Backport for GrowthExperiments: set experiment config only in pilot wikis (T379681)
- 14:33 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox and not A:magru and A:dnsbox
- 14:30 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and A:magru and A:dnsbox
- 14:27 kartik@deploy2002: Finished scap sync-world: Backport for CX3 Build 0.2.0+20241114 (duration: 13m 23s)
- 14:25 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox and A:magru and A:dnsbox
- 14:22 kartik@deploy2002: kartik: Continuing with sync
- 14:18 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
- 14:17 kartik@deploy2002: kartik: Backport for CX3 Build 0.2.0+20241114 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:13 kartik@deploy2002: Started scap sync-world: Backport for CX3 Build 0.2.0+20241114
- 14:05 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
- 13:50 aqu@deploy2002: Finished deploy [airflow-dags/analytics@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] (duration: 01m 08s)
- 13:49 aqu@deploy2002: Started deploy [airflow-dags/analytics@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d]
- 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
- 13:36 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] (duration: 00m 15s)
- 13:36 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d]
- 13:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
- 13:21 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@c5ab766]: T379546 (duration: 00m 54s)
- 13:21 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@c5ab766]: T379546
- 13:19 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix search button height - oblivian@cumin1002"
- 13:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix search button height - oblivian@cumin1002
- 13:18 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix search button height - oblivian@cumin1002
- 13:18 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix search button height - oblivian@cumin1002"
- 13:05 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-codfw: containerd migration
- 13:04 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet with OS bookworm
- 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
- 12:53 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
- 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
- 12:52 moritzm: installing apache2 security updates
- 12:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
- 12:51 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583) (duration: 09m 08s)
- 12:49 moritzm: failover ganeti master of magru02 to ganeti7002
- 12:46 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 12:45 dreamyjazz@deploy2002: dreamyjazz: Backport for Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
- 12:42 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
- 12:41 dreamyjazz@deploy2002: Started scap sync-world: Backport for Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583)
- 12:38 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
- 12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
- 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7002.magru.wmnet
- 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet
- 12:22 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bookworm
- 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
- 12:18 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
- 12:17 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-codfw: containerd migration
- 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
- 12:00 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
- 11:57 moritzm: restarting postfix on inbound/outbound servers to pick up openssl updates
- 11:17 moritzm: installing openssl security updates
- 11:08 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-codfw: containerd migration
- 11:08 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2001.codfw.wmnet with OS bookworm
- 10:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 10:45 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
- 10:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 10:42 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
- 10:16 moritzm: remove ganeti2017 from active ganeti nodes T376594
- 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
- 10:11 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bookworm
- 10:07 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) (duration: 00m 47s)
- 10:06 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-codfw: containerd migration
- 10:06 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided)
- 10:03 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) (duration: 00m 21s)
- 10:03 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided)
- 09:43 kart_: Done: UTC morning backport window
- 09:37 kartik@deploy2002: Finished scap sync-world: Backport for Correction to virtual-globaljsonlinks mapping (T374746) (duration: 10m 03s)
- 09:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 09:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 09:32 kartik@deploy2002: bvibber, kartik: Continuing with sync
- 09:31 kartik@deploy2002: bvibber, kartik: Backport for Correction to virtual-globaljsonlinks mapping (T374746) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:27 kartik@deploy2002: Started scap sync-world: Backport for Correction to virtual-globaljsonlinks mapping (T374746)
- 09:25 kartik@deploy2002: Finished scap sync-world: Backport for CX3 Build 0.2.0+20241113 (T368718 T374567) (duration: 29m 40s)
- 09:21 kartik@deploy2002: kartik: Continuing with sync
- 09:17 volans: installed spicerack v8.16.0 on cumin2002
- 09:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4044.ulsfo.wmnet,cp4052.ulsfo.wmnet} and A:cp
- 09:04 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4044.ulsfo.wmnet,cp4052.ulsfo.wmnet} and A:cp
- 09:00 kartik@deploy2002: kartik: Backport for CX3 Build 0.2.0+20241113 (T368718 T374567) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:56 kartik@deploy2002: Started scap sync-world: Backport for CX3 Build 0.2.0+20241113 (T368718 T374567)
- 08:55 vgutierrez: import haproxy 2.8.12 to thirtdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o) - T379891
- 08:54 kartik@deploy2002: Finished scap sync-world: Backport for Allow Wikidata bureaucrats to remove admin rights (T379635) (duration: 11m 49s)
- 08:49 kartik@deploy2002: dreamrimmer, kartik: Continuing with sync
- 08:47 kartik@deploy2002: dreamrimmer, kartik: Backport for Allow Wikidata bureaucrats to remove admin rights (T379635) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:42 kartik@deploy2002: Started scap sync-world: Backport for Allow Wikidata bureaucrats to remove admin rights (T379635)
- 08:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26744
- 08:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 26744
- 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082
- 08:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 141082
- 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9299
- 08:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 9299
- 08:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 140407
- 08:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 140407
- 08:28 kartik@deploy2002: Finished scap sync-world: Backport for Update stream registration and config for MinT for Readers (T378565) (duration: 24m 50s)
- 08:23 kartik@deploy2002: kcvelaga, kartik: Continuing with sync
- 08:08 kartik@deploy2002: kcvelaga, kartik: Backport for Update stream registration and config for MinT for Readers (T378565) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:03 kartik@deploy2002: Started scap sync-world: Backport for Update stream registration and config for MinT for Readers (T378565)
- 07:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet
- 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
- 07:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet
- 07:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove office link dns records - ayounsi@cumin1002"
- 07:34 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove office link dns records - ayounsi@cumin1002"
- 07:30 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 07:06 XioNoX: delete office interco IP/prefixes/vlan in ulsfo - T379778
- 04:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
- 04:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 04:09 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 03:56 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 02:32 eileen: config revision changed from 7af5769b to fbddc1f5
- 02:29 eileen: civicrm upgraded from 7b300007 to 2ab8334a
- 00:14 eileen: config revision changed from 2b08b881 to 7af5769b
- 00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 00:12 eileen: civicrm upgraded from 23e08fc2 to 7b300007
- 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1041.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
2024-11-13
- 23:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:43 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1041.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for es104 - jclark@cumin1002"
- 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for es104 - jclark@cumin1002"
- 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:37 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 23:20 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm
- 23:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 23:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 22:59 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:57 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 22:55 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 22:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
- 22:21 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 22:20 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 22:20 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 22:19 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 22:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 22:17 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 22:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 22:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 22:11 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 22:10 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 22:10 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 22:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 22:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 22:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 22:00 tchanders@deploy2002: Finished scap sync-world: Backport for Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503) (duration: 09m 03s)
- 21:55 tchanders@deploy2002: tchanders: Continuing with sync
- 21:55 tchanders@deploy2002: tchanders: Backport for Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:51 tchanders@deploy2002: Started scap sync-world: Backport for Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503)
- 21:48 cjming@deploy2002: Finished scap sync-world: Backport for Enable autocreateaccount on testcommonswiki (T378216) (duration: 12m 59s)
- 21:44 cjming@deploy2002: aude, cjming: Continuing with sync
- 21:40 cjming@deploy2002: aude, cjming: Backport for Enable autocreateaccount on testcommonswiki (T378216) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:36 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm
- 21:36 cjming@deploy2002: Started scap sync-world: Backport for Enable autocreateaccount on testcommonswiki (T378216)
- 21:34 cjming@deploy2002: Finished scap sync-world: Backport for GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746) (duration: 13m 27s)
- 21:27 cjming@deploy2002: cjming, bvibber: Continuing with sync
- 21:27 cjming@deploy2002: cjming, bvibber: Backport for GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:20 cjming@deploy2002: Started scap sync-world: Backport for GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746)
- 21:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 21:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:15 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 21:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2005
- 21:07 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2005
- 21:05 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:05 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 21:01 aqu@deploy2002: Finished deploy [airflow-dags/analytics@3487da3]: Stage Refine [airflow-dags@3487da3a] (duration: 01m 22s)
- 21:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@3487da3]: Stage Refine [airflow-dags@3487da3a]
- 20:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] (duration: 01m 14s)
- 20:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 20:55 aqu@deploy2002: Started deploy [airflow-dags/analytics@3fc12d6]: Stage Refine [airflow-dags@3fc12d60]
- 20:49 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 20:49 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 20:48 swfrench-wmf: deployed changeprop to clear no-op chart version diffs from CR 1089313
- 20:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 20:47 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
- 20:39 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm
- 20:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 20:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 20:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 20:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 20:35 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 20:34 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 20:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] (duration: 00m 15s)
- 20:34 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@3fc12d6]: Stage Refine [airflow-dags@3fc12d60]
- 20:31 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 20:31 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 20:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 20:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 20:16 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 20:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 20:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 20:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2005
- 19:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2005
- 19:58 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 19:58 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 19:58 brennen@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662 (duration: 31m 07s)
- 19:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 19:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 19:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 19:51 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2005 to codfw - jhancock@cumin2002"
- 19:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2005 to codfw - jhancock@cumin2002"
- 19:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 19:47 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 19:46 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 19:44 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Update
- 19:37 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update
- 19:36 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
- 19:35 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Update
- 19:27 brennen@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662
- 19:26 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.3 refs T375662
- 19:21 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Update
- 19:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1005.eqiad.wmnet with OS bullseye
- 19:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:10 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:09 brennen: 1.44.0-wmf.3 train status (T375662): no current blockers, rolling to group1.
- 19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/hdfs-synchronizer: apply
- 19:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:03 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:01 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:00 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:00 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1005 - jclark@cumin1002"
- 19:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1005 - jclark@cumin1002"
- 18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/hdfs-synchronizer: apply
- 18:56 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 18:50 swfrench@deploy2002: Finished scap sync-world: Deployment to switch mwdebug-next to publish-81 - T372604 (duration: 01m 53s)
- 18:48 swfrench@deploy2002: Started scap sync-world: Deployment to switch mwdebug-next to publish-81 - T372604
- 18:36 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 18:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 18:32 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 18:30 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@3499887]: I really hope this works this time (duration: 00m 34s)
- 18:29 cdanis@deploy2002: Started deploy [docker-pkg/deploy@3499887]: I really hope this works this time
- 18:29 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 18:26 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) (duration: 00m 18s)
- 18:26 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: (no justification provided)
- 18:22 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) (duration: 00m 40s)
- 18:21 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: (no justification provided)
- 18:21 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: deploy 4.0.2 for realsies (duration: 02m 41s)
- 18:18 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: deploy 4.0.2 for realsies
- 18:13 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 18:13 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 18:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
- 17:54 urbanecm: mwmaint2002: foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --search-index --verbose --random # T379057
- 17:49 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@38eb04d]: ship upstream_version helper (duration: 00m 32s)
- 17:49 cdanis@deploy2002: Started deploy [docker-pkg/deploy@38eb04d]: ship upstream_version helper
- 17:49 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 17:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:46 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 17:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 17:40 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2002.codfw.wmnet
- 17:39 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2002.codfw.wmnet
- 17:39 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2002.codfw.wmnet
- 17:38 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet with OS bookworm
- 17:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 17:33 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:32 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 17:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2128-2135].codfw.wmnet
- 17:23 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2128-2135].codfw.wmnet
- 17:20 claime: homer 'lsw1-d2-codfw*' commit 'T377008'
- 17:18 claime: homer 'lsw1-c2-codfw*' commit 'T377008'
- 17:18 claime: homer 'lsw1-d4-codfw*' commit 'T377008'
- 17:17 claime: homer 'lsw1-c4-codfw*' commit 'T377008'
- 17:15 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 17:14 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
- 17:11 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
- 17:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye
- 17:02 claime: homer 'cr*codfw*' commit T377008
- 17:01 claime: homer 'lsw1-b4-codfw*' commit T377008
- 17:01 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 16:58 claime: homer 'lsw1-b2-codfw*' commit T377008
- 16:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-ctrl2002
- 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2002
- 16:53 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2002
- 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-ctrl2002.codfw.wmnet 76.32.192.10.in-addr.arpa 6.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 16:53 jayme@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-ctrl2002.codfw.wmnet 76.32.192.10.in-addr.arpa 6.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:53 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-ctrl2002 - jayme@cumin2002"
- 16:53 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-ctrl2002 - jayme@cumin2002"
- 16:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm
- 16:49 jayme@cumin2002: START - Cookbook sre.dns.netbox
- 16:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm
- 16:47 jayme@cumin2002: START - Cookbook sre.hosts.move-vlan for host wikikube-ctrl2002
- 16:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:47 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bookworm
- 16:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 16:41 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: reimage
- 16:40 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: reimage
- 16:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
- 16:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage
- 16:31 jayme@cumin2002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2002.codfw.wmnet
- 16:30 elukey: reload nginx on registry* to pick up logging changes (log of X-Client-IP from the CDN)
- 16:30 XioNoX: shutdown old office link interface - T379778
- 16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm
- 16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage
- 16:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
- 16:26 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage
- 16:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage
- 16:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm
- 16:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
- 16:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
- 16:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage
- 16:08 sukhe: running agent on A:ulsfo and A:lvs
- 16:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm
- 16:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm
- 16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage
- 16:04 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage
- 16:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage
- 15:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm
- 15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm
- 15:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 15:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 15:45 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/hdfs-synchronizer: apply
- 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm
- 15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm
- 15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm
- 15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage
- 15:36 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:35 moritzm: failover ganeti master of magru01 to ganeti7001
- 15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage
- 15:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage
- 15:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:33 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:30 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:30 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:30 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
- 15:30 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
- 15:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage
- 15:26 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
- 15:18 moritzm: installing apache2 security updates
- 15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage
- 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm
- 15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage
- 15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
- 15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm
- 15:12 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm
- 14:59 volans: uploaded spicerack_8.16.0 to apt.wikimedia.org bullseye-wikimedia
- 14:57 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm
- 14:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2eb8320]: Stage Refine [airflow-dags@2eb8320d] (duration: 00m 14s)
- 14:55 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2eb8320]: Stage Refine [airflow-dags@2eb8320d]
- 14:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage
- 14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage
- 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
- 14:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
- 14:37 moritzm: installing openssl security updates
- 14:36 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2131.codfw.wmnet with OS bookworm
- 14:36 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2130.codfw.wmnet with OS bookworm
- 14:35 Lucas_WMDE: UTC afternoon backport+config window done
- 14:33 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
- 14:32 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for TimedMediahandler: reenable shellbox-video for commons (T356241) (duration: 07m 28s)
- 14:30 btullis@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-jumbo-eqiad
- 14:27 lucaswerkmeister-wmde@deploy2002: hnowlan, lucaswerkmeister-wmde: Continuing with sync
- 14:27 lucaswerkmeister-wmde@deploy2002: hnowlan, lucaswerkmeister-wmde: Backport for TimedMediahandler: reenable shellbox-video for commons (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 14:24 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for TimedMediahandler: reenable shellbox-video for commons (T356241)
- 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 14:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2128.codfw.wmnet with OS bookworm
- 14:14 tchanders@deploy2002: Finished scap sync-world: Backport for Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503) (duration: 11m 28s)
- 14:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 14:10 tchanders@deploy2002: tchanders: Continuing with sync
- 14:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 14:07 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
- 14:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1052.eqiad.wmnet to cluster eqiad and group D
- 14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
- 14:06 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1052.eqiad.wmnet to cluster eqiad and group D
- 14:06 tchanders@deploy2002: tchanders: Backport for Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:03 tchanders@deploy2002: Started scap sync-world: Backport for Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503)
- 14:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
- 14:02 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
- 14:01 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 14:01 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 14:00 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:59 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:32 btullis@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
- 13:21 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:20 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 13:18 moritzm: installing python-cryptography security updates
- 13:18 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:18 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
- 13:17 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 13:14 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:13 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 13:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:07 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 13:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 13:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 13:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:03 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 12:59 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2129.codfw.wmnet with OS bookworm
- 12:56 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 12:56 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 12:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
- 12:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2128.codfw.wmnet with OS bookworm
- 12:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
- 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T376905)', diff saved to https://phabricator.wikimedia.org/P71030 and previous config saved to /var/cache/conftool/dbconfig/20241113-124504-ladsgroup.json
- 12:44 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2128.codfw.wmnet with OS bookworm
- 12:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1051.eqiad.wmnet to cluster eqiad and group D
- 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm
- 12:32 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1051.eqiad.wmnet to cluster eqiad and group D
- 12:31 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 12:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm
- 12:30 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P71029 and previous config saved to /var/cache/conftool/dbconfig/20241113-122957-ladsgroup.json
- 12:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm
- 12:29 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
- 12:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
- 12:28 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
- 12:18 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
- 12:15 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
- 12:15 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
- 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P71028 and previous config saved to /var/cache/conftool/dbconfig/20241113-121450-ladsgroup.json
- 12:14 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
- 12:14 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
- 12:13 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 12:13 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 12:11 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 12:11 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1052.eqiad.wmnet
- 12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 12:01 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T376905)', diff saved to https://phabricator.wikimedia.org/P71027 and previous config saved to /var/cache/conftool/dbconfig/20241113-115943-ladsgroup.json
- 11:57 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
- 11:57 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
- 11:57 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
- 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1052.eqiad.wmnet
- 11:57 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
- 11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1051.eqiad.wmnet
- 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1052
- 11:54 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1052
- 11:52 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 11:51 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 11:51 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 11:49 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1022 (T376905)', diff saved to https://phabricator.wikimedia.org/P71026 and previous config saved to /var/cache/conftool/dbconfig/20241113-114913-ladsgroup.json
- 11:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1051.eqiad.wmnet
- 11:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
- 11:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
- 11:48 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1051
- 11:46 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1051
- 11:45 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:41 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
- 11:41 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet with OS bookworm
- 11:34 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:34 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker1256.eqiad.wmnet with reason: Degraded RAID
- 11:26 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wikikube-worker1256.eqiad.wmnet with reason: Degraded RAID
- 11:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1256.eqiad.wmnet
- 11:25 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1256.eqiad.wmnet
- 11:19 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
- 11:18 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
- 11:17 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
- 11:14 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
- 11:10 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
- 11:09 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
- 10:42 ladsgroup@deploy2002: Finished scap sync-world: Backport for Set the ratio of the new ParserCache keys to 100 for prod (T373037) (duration: 07m 32s)
- 10:37 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 10:36 ladsgroup@deploy2002: ladsgroup: Backport for Set the ratio of the new ParserCache keys to 100 for prod (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
- 10:34 ladsgroup@deploy2002: Started scap sync-world: Backport for Set the ratio of the new ParserCache keys to 100 for prod (T373037)
- 10:32 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
- 10:27 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bookworm
- 10:26 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 10:26 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
- 10:24 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
- 10:24 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet with OS bookworm
- 10:21 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
- 10:20 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
- 10:20 ladsgroup@deploy2002: ladsgroup: Backport for Set the ratio of the new ParserCache keys to 100 for prod (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:18 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
- 10:17 ladsgroup@deploy2002: Started scap sync-world: Backport for Set the ratio of the new ParserCache keys to 100 for prod (T373037)
- 10:09 elukey: disallow calls to /v2/_catalog from the outside internet on Docker Registry hosts - T378618
- 10:04 claime: Manual restart of dump_cloud_ip_ranges.service on 'A:puppetserver or A:puppetmaster'
- 10:01 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
- 10:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2088.codfw.wmnet with OS bullseye
- 10:00 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 10:00 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 09:55 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
- 09:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
- 09:38 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
- 09:25 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye
- 09:20 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bookworm
- 09:20 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
- 09:11 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye
- 09:01 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye
- 08:54 kart_: Updated recommedation-api to 2024-11-08-142328-production and fix wikidata host header (T379592)
- 08:49 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 08:49 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye
- 08:46 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 08:33 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
- 08:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
- 08:14 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye
- 08:13 ladsgroup@deploy2002: Finished scap sync-world: Backport for Revert "cswiki: Add celebration logo" (duration: 09m 18s)
- 08:08 ladsgroup@deploy2002: ladsgroup, hamishz: Continuing with sync
- 08:07 ladsgroup@deploy2002: ladsgroup, hamishz: Backport for Revert "cswiki: Add celebration logo" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:06 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 08:04 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "cswiki: Add celebration logo"
- 07:47 Amir1: running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis (T308084)
- 05:17 eileen: civicrm upgraded from ad008134 to 23e08fc2
- 02:56 tchin@deploy2002: Finished deploy [airflow-dags/analytics@58d7b82]: (no justification provided) (duration: 00m 10s)
- 02:56 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided)
- 02:55 tchin@deploy2002: deploy aborted: failedpythonlol (duration: 00m 05s)
- 02:55 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: failedpythonlol
- 00:54 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided)
- 00:35 ejegg: payments-wiki upgraded from 7d24a942 to 459f259b
2024-11-12
- 23:28 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
- 23:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 23:08 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 22:35 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
- 21:55 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 21:55 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 21:28 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@58d7b82]: (no justification provided) (duration: 03m 50s)
- 21:27 SandraEbele_: deploying airflow as part of weekly deployment train
- 21:27 urbanecm@deploy2002: Finished scap sync-world: Backport for Fix warning about missing central account for temp users (T378289), Check session provider when autocreating (T378289) (duration: 16m 11s)
- 21:25 ebysans@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided)
- 21:23 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs
- 21:22 urbanecm@deploy2002: urbanecm, tgr: Continuing with sync
- 21:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 21:13 urbanecm@deploy2002: urbanecm, tgr: Backport for Fix warning about missing central account for temp users (T378289), Check session provider when autocreating (T378289) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:11 urbanecm@deploy2002: Started scap sync-world: Backport for Fix warning about missing central account for temp users (T378289), Check session provider when autocreating (T378289)
- 21:09 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert^2 "[CirrusSearch] testwiki: enable offloading weighted tags via EventBus" (T378983) (duration: 07m 18s)
- 21:04 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@113ea5ac] (duration: 04m 09s)
- 21:02 urbanecm@deploy2002: Started scap sync-world: Backport for Revert^2 "[CirrusSearch] testwiki: enable offloading weighted tags via EventBus" (T378983)
- 20:59 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@113ea5ac]
- 20:59 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a] (thin): Regular analytics weekly train THIN [analytics/refinery@113ea5ac] (duration: 04m 54s)
- 20:54 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a] (thin): Regular analytics weekly train THIN [analytics/refinery@113ea5ac]
- 20:53 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a]: Regular analytics weekly train [analytics/refinery@113ea5ac] (duration: 07m 37s)
- 20:49 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 20:46 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a]: Regular analytics weekly train [analytics/refinery@113ea5ac]
- 19:42 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1001.eqiad.wmnet
- 19:42 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1001.eqiad.wmnet
- 19:42 jayme@cumin2002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1001.*
- 19:40 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
- 19:16 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
- 19:14 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.3 refs T375662
- 19:13 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
- 19:06 brennen: 1.44.0-wmf.3 train status (T375662): no current blockers, rolling to group0.
- 18:55 moritzm: installing libarchive security updates
- 18:55 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
- 18:31 swfrench@deploy2002: Finished scap sync-world: Backport for Add title-case mapping to support migration to PHP 8.1 (T372603) (duration: 18m 48s)
- 18:25 swfrench@deploy2002: swfrench: Continuing with sync
- 18:24 swfrench-wmf: verified consistent 7.4-like title-case behavior in 7.4- and 8.1-based images, verified expected treatment of eszett in mwdebug - T372603
- 18:19 swfrench@deploy2002: swfrench: Backport for Add title-case mapping to support migration to PHP 8.1 (T372603) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:12 swfrench@deploy2002: Started scap sync-world: Backport for Add title-case mapping to support migration to PHP 8.1 (T372603)
- 18:08 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
- 18:01 moritzm: remove ganeti1012 from active ganeti nodes T378921
- 17:59 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:57 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 17:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 17:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:34 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 17:26 brennen@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662 (duration: 45m 29s)
- 16:55 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
- 16:54 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
- 16:54 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
- 16:53 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
- 16:48 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
- 16:47 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 16:40 brennen@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662
- 16:39 jayme@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 16:37 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
- 16:34 dancy@deploy2002: Installation of scap version "4.123.0" completed for 209 hosts
- 16:30 dancy@deploy2002: Installing scap version "4.123.0" for 209 hosts
- 16:18 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
- 16:18 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
- 16:17 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
- 16:17 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
- 16:16 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
- 16:15 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
- 16:13 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr[1-2]-eqiad
- 16:13 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for cr[1-2]-eqiad
- 16:08 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 16:07 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 15:57 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
- 15:56 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:55 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 15:52 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:52 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 15:47 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
- 15:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
- 15:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
- 15:27 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 15:19 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
- 15:16 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1002.eqiad.wmnet
- 15:16 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1002.eqiad.wmnet
- 15:16 topranks: moving fundraising links in eqiad from old to new firewall cluster and switches (T377381)
- 15:14 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
- 15:13 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=99) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
- 15:10 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
- 15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[1-2]-eqiad,pfw3-eqiad with reason: fundraising tech migration to new equipment
- 15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cr[1-2]-eqiad,pfw3-eqiad with reason: fundraising tech migration to new equipment
- 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
- 14:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on fasw-c-eqiad with reason: fundraising tech migration to new equipment
- 14:30 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on fasw-c-eqiad with reason: fundraising tech migration to new equipment
- 14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
- 14:28 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
- 14:26 moritzm: installing apache2 security updates
- 14:23 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:03 urbanecm@deploy2002: Started scap sync-world: Backport for [CirrusSearch] testwiki: enable offloading weighted tags via EventBus (T378983)
- 13:58 urbanecm@deploy2002: Started scap sync-world: Backport for [CirrusSearch] testwiki: enable offloading weighted tags via EventBus (T378983)
- 13:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:43 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662
- 13:37 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662
- 13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
- 13:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain
- 13:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain
- 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
- 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
- 13:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:10 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
- 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd
- 13:09 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
- 13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd
- 12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1002.eqiad.wmnet to plain
- 12:53 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1002.eqiad.wmnet to plain
- 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
- 12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
- 12:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1002.eqiad.wmnet to drbd
- 12:35 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1002.eqiad.wmnet to drbd
- 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
- 12:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2236 slowly with 10 steps - slow repool T373579
- 12:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
- 12:09 moritzm: remove ganeti1015 from active ganeti nodes T378921
- 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1010.eqiad.wmnet
- 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 12:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
- 11:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:52 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 11:48 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
- 11:47 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1010.eqiad.wmnet
- 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1013.eqiad.wmnet
- 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 11:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 11:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1013.eqiad.wmnet
- 11:23 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 10:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2217 gradually with 4 steps - T379491
- 10:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:37 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:12 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2236 slowly with 10 steps - slow repool T373579
- 09:59 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2217 gradually with 4 steps - T379491
- 09:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71006 and previous config saved to /var/cache/conftool/dbconfig/20241112-094851-arnaudb.json
- 09:41 moritzm: update d-i netboot image for 12.8 point release T379600
- 09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71005 and previous config saved to /var/cache/conftool/dbconfig/20241112-093343-arnaudb.json
- 09:18 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "CirrusSearch: re-enable offloading weighted tags via EventBus" (duration: 06m 46s)
- 09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71004 and previous config saved to /var/cache/conftool/dbconfig/20241112-091836-arnaudb.json
- 09:17 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:14 urbanecm@deploy2002: trainbranchbot, urbanecm: Continuing with sync
- 09:14 urbanecm@deploy2002: trainbranchbot, urbanecm: Backport for Revert "CirrusSearch: re-enable offloading weighted tags via EventBus" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:11 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "CirrusSearch: re-enable offloading weighted tags via EventBus"
- 09:10 urbanecm@deploy2002: Sync cancelled.
- 09:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71002 and previous config saved to /var/cache/conftool/dbconfig/20241112-090329-arnaudb.json
- 08:38 urbanecm@deploy2002: pfischer, urbanecm: Backport for CirrusSearch: re-enable offloading weighted tags via EventBus (T378983) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:36 urbanecm@deploy2002: Started scap sync-world: Backport for CirrusSearch: re-enable offloading weighted tags via EventBus (T378983)
- 08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
- 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
- 08:28 urbanecm@deploy2002: Finished scap sync-world: Backport for Fix WeightedTagsUpdater (T378664 T378983) (duration: 06m 59s)
- 08:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
- 08:21 urbanecm@deploy2002: Started scap sync-world: Backport for Fix WeightedTagsUpdater (T378664 T378983)
- 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
- 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
- 08:04 moritzm: installing apache security updates
- 08:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71001 and previous config saved to /var/cache/conftool/dbconfig/20241112-080303-arnaudb.json
- 08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 08:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 08:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti-test2003
- 07:53 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-test2003
- 07:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 07:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.28 (duration: 01m 52s)
2024-11-11
- away: UTC late deploys done
- 23:08 tgr@deploy2002: scap failed: <CalledProcessError> Command '['sudo', '-u', 'mwbuilder', '-n', '--', '/usr/bin/scap', 'mwscript', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--network', '--', 'purgeMessageBlobStore.php']' returned non-zero exit status 1. (scap version: 4.122.0) (duration: 11m 44s)
- 23:02 tgr@deploy2002: d3r1ck01, tgr: Continuing with sync
- 22:59 tgr@deploy2002: d3r1ck01, tgr: Backport for PageUpdater: restore call to RevisionFromEditComplete (T379152) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:56 tgr@deploy2002: Started scap sync-world: Backport for PageUpdater: restore call to RevisionFromEditComplete (T379152)
- 22:30 tgr@deploy2002: Finished scap sync-world: Backport for contactpage: Update AffCom contact form messages (Resubmit) (T375392) (duration: 25m 48s)
- 22:21 tgr@deploy2002: tgr: Continuing with sync
- 22:19 tgr@deploy2002: tgr: Backport for contactpage: Update AffCom contact form messages (Resubmit) (T375392) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:13 eileen: civicrm upgraded from 4330588d to bcd072a1
- 22:05 tgr@deploy2002: Started scap sync-world: Backport for contactpage: Update AffCom contact form messages (Resubmit) (T375392)
- 21:38 tgr@deploy2002: Finished scap sync-world: Backport for contactpages: Update Affcom UserGroup application form (T375392) (duration: 28m 07s)
- 21:33 tgr@deploy2002: ammarpad, tgr: Continuing with sync
- 21:12 tgr@deploy2002: ammarpad, tgr: Backport for contactpages: Update Affcom UserGroup application form (T375392) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:10 tgr@deploy2002: Started scap sync-world: Backport for contactpages: Update Affcom UserGroup application form (T375392)
- 20:21 eileen: civicrm upgraded from 65a8de90 to 4330588d
- 17:55 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add superset links - oblivian@cumin1002 - T379567"
- 17:55 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add superset links - oblivian@cumin1002 - T379567
- 17:54 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add superset links - oblivian@cumin1002 - T379567
- 17:54 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add superset links - oblivian@cumin1002 - T379567"
- 16:19 elukey: restart pybal on lvs2013 (primary) to pick up new kartotherian-k8s-ssl service
- 16:17 elukey: restart pybal on lvs2014 (secondary) to pick up new kartotherian-k8s-ssl service
- 16:10 elukey: restart pybal on lvs1019 (primary) to pick up new kartotherian-k8s-ssl service
- 16:09 elukey: restart pybal on lvs1020 (secondary) to pick up new kartotherian-k8s-ssl service
- 16:09 moritzm: installing libarchive security updates
- 15:55 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
- 15:55 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
- 15:54 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=codfw,service=kartotherian-k8s-ssl
- 15:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bookworm
- 15:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm
- 15:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:00 Lucas_WMDE: UTC afternoon backport+config window done
- 15:00 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for wikipedias: clear link-recommendations on page save (T379522) (duration: 10m 59s)
- 14:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:56 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Continuing with sync
- 14:51 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Backport for wikipedias: clear link-recommendations on page save (T379522) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:49 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for wikipedias: clear link-recommendations on page save (T379522)
- 14:44 btullis@cumin1002: END (FAIL) - Cookbook sre.presto.roll-restart-workers (exit_code=99) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
- 14:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bookworm
- 14:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:35 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye
- 14:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bookworm
- 14:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bookworm
- 14:32 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bookworm
- 14:28 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:28 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:27 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye
- 14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
- 14:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bookworm
- 14:26 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:25 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
- 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bookworm
- 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:20 zabe@deploy2002: Finished scap sync-world: Backport for zhwiki: Allow event-organizer self remove usergroup (T376061) (duration: 10m 40s)
- 14:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
- 14:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
- 14:15 zabe@deploy2002: zabe, zhaofjx: Continuing with sync
- 14:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
- 14:12 zabe@deploy2002: zabe, zhaofjx: Backport for zhwiki: Allow event-organizer self remove usergroup (T376061) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
- 14:09 zabe@deploy2002: Started scap sync-world: Backport for zhwiki: Allow event-organizer self remove usergroup (T376061)
- 14:07 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
- 14:06 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
- 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc2002.wikimedia.org
- 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 14:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
- 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
- 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
- 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
- 14:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
- 14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
- 14:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
- 14:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
- 14:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
- 13:55 moritzm: powercycled ganeti2031
- 13:44 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:39 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc2002.wikimedia.org
- 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc1002.wikimedia.org
- 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bookworm
- 13:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bookworm
- 13:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1311.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1312.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bookworm
- 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm
- 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bookworm
- 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bookworm
- 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bookworm
- 13:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bookworm
- 13:30 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1307.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1309.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1310.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1308.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1305.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc1002.wikimedia.org
- 13:22 jynus: reverting deleted rows on db1176 (mailman3) T379519
- 13:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1312.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1311.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1050.eqiad.wmnet to cluster eqiad and group D
- 13:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1050.eqiad.wmnet to cluster eqiad and group D
- 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1310.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1309.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1308.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1307.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1305.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:10 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Exclude temp account viewer autopromotions from RC (T377829) (duration: 07m 07s)
- 13:08 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 13:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
- 13:05 dreamyjazz@deploy2002: mszabo, dreamyjazz: Continuing with sync
- 13:05 dreamyjazz@deploy2002: mszabo, dreamyjazz: Backport for Exclude temp account viewer autopromotions from RC (T377829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:05 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix bug in requestctl commit - oblivian@cumin1002"
- 13:05 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bug in requestctl commit - oblivian@cumin1002
- 13:04 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bug in requestctl commit - oblivian@cumin1002
- 13:04 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix bug in requestctl commit - oblivian@cumin1002"
- 13:04 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 13:03 dreamyjazz@deploy2002: Started scap sync-world: Backport for Exclude temp account viewer autopromotions from RC (T377829)
- 13:00 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
- 12:54 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
- 12:48 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
- 12:42 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
- 12:41 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1049.eqiad.wmnet to cluster eqiad and group D
- 12:40 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1049.eqiad.wmnet to cluster eqiad and group D
- 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1050.eqiad.wmnet
- 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1050.eqiad.wmnet
- 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1049.eqiad.wmnet
- 12:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
- 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1049.eqiad.wmnet
- 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1050
- 12:16 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1050
- 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1049
- 12:15 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1049
- 12:13 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
- 12:06 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
- 12:01 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
- 11:56 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
- 11:56 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet
- 11:54 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
- 11:46 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
- 11:44 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 11:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet
- 11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
- 10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002"
- 10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002
- 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002
- 10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002"
- 09:10 moritzm: remove ganeti1011 from active ganeti nodes T378921
- 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet
- 08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for Update Wikimedia Foundation primary address. (T379417), Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026) (duration: 07m 15s)
- 08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync
- 08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for Update Wikimedia Foundation primary address. (T379417), Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:32 urbanecm@deploy2002: Started scap sync-world: Backport for Update Wikimedia Foundation primary address. (T379417), Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)
- 08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500) (duration: 20m 59s)
- 08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync
- 08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002"
- 08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002
- 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002
- 08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002"
- 08:11 urbanecm@deploy2002: Started scap sync-world: Backport for Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)
- 07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
- 07:49 _joe_: installing conftool 4.1.0 on puppetservers
- 07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
2024-11-10
- 23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
- 23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 22:29 jhathaway: re-imaging ms-be2082 to test efi boot order
- 12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled)
- 12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index
- 12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index
- 12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json
2024-11-09
- 14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
2024-11-08
- 23:35 zabe: attach Sotiale's local accounts on newly created wikis
- 23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for T379398 because oad_user=0
- 23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
- 22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
- 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 21:18 denisse: disabling Puppet on grafana2001 - T379043
- 21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
- 21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed
- 21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly
- 20:28 aude@deploy2002: Finished scap sync-world: Backport for Reviving "Update interwiki map" (duration: 10m 19s)
- 20:24 aude@deploy2002: seddon, aude: Continuing with sync
- 20:21 aude@deploy2002: seddon, aude: Backport for Reviving "Update interwiki map" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 20:18 aude@deploy2002: Started scap sync-world: Backport for Reviving "Update interwiki map"
- 20:15 aude@deploy2002: Finished scap sync-world: Backport for Enable Tabular data for test commons (T378127) (duration: 10m 55s)
- 20:10 aude@deploy2002: aude: Continuing with sync
- 20:06 aude@deploy2002: aude: Backport for Enable Tabular data for test commons (T378127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:04 aude@deploy2002: Started scap sync-world: Backport for Enable Tabular data for test commons (T378127)
- 20:02 aude@deploy2002: Finished scap sync-world: Backport for Reopen testcommonswiki for testing Chart extension (duration: 14m 33s)
- 19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: T371400
- 19:57 aude@deploy2002: aude: Continuing with sync
- 19:50 aude@deploy2002: aude: Backport for Reopen testcommonswiki for testing Chart extension synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:47 aude@deploy2002: Started scap sync-world: Backport for Reopen testcommonswiki for testing Chart extension
- 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm
- 18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm
- 18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm
- 18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm
- 18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm
- 18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm
- 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm
- 18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
- 18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
- 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm
- 18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage
- 18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage
- 18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage
- 18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage
- 18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage
- 18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage
- 18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage
- 18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage
- 18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage
- 18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage
- 18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage
- 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm
- 17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage
- 17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage
- 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
- 17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
- 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm
- 17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
- 17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
- 17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet
- 17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm
- 17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage
- 17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm
- 17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm
- 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm
- 17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm
- 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm
- 17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm
- 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm
- 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm
- 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm
- 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm
- 17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage
- 17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm
- 17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm
- 17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm
- 17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage
- 17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm
- 17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage
- 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm
- 17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage
- 17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage
- 17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001
- 17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage
- 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage
- 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage
- 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage
- 17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye
- 17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm
- 17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
- 17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
- 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage
- 17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors
- 17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors
- 17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
- 17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
- 17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage
- 17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 17:09 herron@cumin1002: START - Cookbook sre.dns.netbox
- 17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet
- 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage
- 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage
- 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage
- 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage
- 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage
- 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage
- 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage
- 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage
- 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage
- 17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
- 17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm
- 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm
- 17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
- 16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm
- 16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
- 16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
- 16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
- 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
- 16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
- 16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
- 16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm
- 16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage
- 16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage
- 16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm
- 15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
- 15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
- 15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
- 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm
- 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm
- 15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm
- 15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm
- 15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm
- 15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm
- 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage
- 15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm
- 15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage
- 15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm
- 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage
- 15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
- 15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye
- 15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
- 15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage
- 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage
- 15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
- 15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage
- 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage
- 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage
- 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage
- 15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage
- 15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage
- 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage
- 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage
- 14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
- 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage
- 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage
- 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage
- 14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage
- 14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage
- 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm
- 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm
- 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm
- 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
- 14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm
- 14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye
- 14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm
- 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm
- 14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm
- 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm
- 14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
- 14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128']
- 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128']
- 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158']
- 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158']
- 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157']
- 14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157']
- 14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156']
- 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156']
- 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156']
- 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156']
- 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145']
- 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145']
- 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144']
- 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144']
- 14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144']
- 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144']
- 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143']
- 14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143']
- 14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142']
- 14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142']
- 14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141']
- 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141']
- 14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140']
- 14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140']
- 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139']
- 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139']
- 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138']
- 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138']
- 14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137']
- 14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137']
- 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136']
- 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136']
- 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129']
- 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129']
- 14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128']
- 14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128']
- 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye
- 14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye
- 11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment
- 11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye
- 11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage
- 11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage
- 11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
- 11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet
- 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
- 11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
- 10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
- 10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet
- 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet
- 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet
- 10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
- 10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
- 10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
- 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet
- 10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
- 10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
- 10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
- 10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel
- 09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye
- 09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
- 09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
- 09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure
- 09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure
- 09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
- 09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
- 09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye
- 09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye
- 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw
- 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw
- 09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw
- 09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw
- 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw
- 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw
- 09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw
- 09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw
- 08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye
- 08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw
- 08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw
- 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw
- 08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw
- 08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw
- 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw
- 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw
- 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw
- 08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw
- 08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw
- 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw
- 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw
- 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw
- 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw
- 08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C
- 08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw
- 08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw
- 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw
- 08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw
- 08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C
- 08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw
- 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw
- 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw
- 08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw
- 08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw
- 08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw
- 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad
- 08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad
- 08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad
- 08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad
- 08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad
- 08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad
- 08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin
- 08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin
- 08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1
- 08:08 XioNoX: update gnmic to 0.39 on all netflow hosts
- 08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - T347461
- 07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
- 07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001
- 07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
- 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet
- 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet
- 07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
- 07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
- 07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
- 07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
2024-11-07
- 23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm
- 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
- 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye
- 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye
- 22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002"
- 21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002"
- 21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage
- 21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002"
- 21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002"
- 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage
- 21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage
- 21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage
- 21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002"
- 21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002"
- 21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002"
- 21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002"
- 21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm
- 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye
- 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye
- 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027']
- 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026']
- 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027']
- 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026']
- 21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
- 21:11 jsn@deploy2002: Finished scap sync-world: Backport for Enable AutoModerator on viwiki (T378343) (duration: 08m 28s)
- 21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
- 21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync
- 21:06 jsn@deploy2002: suecarmol, jsn: Backport for Enable AutoModerator on viwiki (T378343) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002"
- 21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002"
- 21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 21:02 jsn@deploy2002: Started scap sync-world: Backport for Enable AutoModerator on viwiki (T378343)
- 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002"
- 20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002"
- 20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
- 20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:35 cdanis@deploy2002: Finished scap sync-world: Backport for Enable Chart extension on testwiki and testcommonswiki (T378127) (duration: 13m 02s)
- 20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync
- 20:25 cdanis@deploy2002: cdanis, aude: Backport for Enable Chart extension on testwiki and testcommonswiki (T378127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:22 cdanis@deploy2002: Started scap sync-world: Backport for Enable Chart extension on testwiki and testcommonswiki (T378127)
- 20:21 cdanis@deploy2002: Finished scap sync-world: Backport for DB config for testcommonswiki deployment for Charts (T379199) (duration: 10m 45s)
- 20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync
- 20:13 cdanis@deploy2002: cdanis, bvibber: Backport for DB config for testcommonswiki deployment for Charts (T379199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:10 cdanis@deploy2002: Started scap sync-world: Backport for DB config for testcommonswiki deployment for Charts (T379199)
- 20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts
- 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002"
- 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002"
- 19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 19:23 cdanis: T379199 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql
- 19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
- 19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
- 19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet
- 19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
- 19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
- 19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables
- 19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables
- 19:08 mutante: VRTS - switching firewall provider from iptables to nftables
- 19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
- 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet
- 19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
- 19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
- 18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
- 18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
- 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors
- 18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors
- 18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
- 18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
- 18:50 herron@cumin1002: START - Cookbook sre.dns.netbox
- 18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet
- 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002"
- 18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002"
- 18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - T356241
- 18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet
- 17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet
- 17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
- 17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002"
- 17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002"
- 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad
- 17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad
- 17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # T375508
- 17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # T375508
- 17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
- 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
- 16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
- 16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye
- 16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
- 16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
- 16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye
- 16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
- 16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
- 16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
- 16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
- 16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
- 15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad
- 15:54 moritzm: remove ganeti1010 from active ganeti nodes T378921
- 15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary (T378466) and tcywikisource (T378474)
- 15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
- 15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s)
- 15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad
- 15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # T375950
- 15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https
- 15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw
- 15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase
- 15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
- 15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s)
- 15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided)
- 15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s)
- 15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided)
- 15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw
- 14:55 hashar: Restarted CI Jenkins for plugins update
- 14:41 moritzm: installing python-git security updates
- 14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
- 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381) (duration: 09m 37s)
- 14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync
- 14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)
- 14:13 kartik@deploy2002: Finished scap sync-world: Backport for Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420) (duration: 10m 08s)
- 14:09 kartik@deploy2002: kartik: Continuing with sync
- 14:06 kartik@deploy2002: kartik: Backport for Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s)
- 14:03 kartik@deploy2002: Started scap sync-world: Backport for Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)
- 14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3]
- 13:52 cwhite: running thanos bucket cleanup on titan1001 - T351927
- 13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048
- 13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048
- 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047
- 13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047
- 13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s)
- 13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640]
- 13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s)
- 13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640]
- 12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s)
- 12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047
- 12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047
- 12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047
- 12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047
- 12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640]
- 12:16 vgutierrez: repool liberica on lvs1013
- 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync
- 11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync
- 11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync
- 11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync
- 11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync
- 11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync
- 11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
- 11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
- 11:03 vgutierrez: depool liberica on lvs1013
- 11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
- 10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
- 10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye
- 10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
- 10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
- 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
- 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
- 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
- 10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
- 10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
- 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
- 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002"
- 09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002
- 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002
- 09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002"
- 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json
- 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
- 09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye
- 09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json
- 09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia)
- 09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json
- 09:21 moritzm: installing openjdk-8 security updates
- 09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia
- 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs T375661
- 09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json
- 08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
- 08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 08:26 kartik@deploy2002: Finished scap sync-world: Backport for Translate: Enable message bundle Scribunto module on testwiki (T359918) (duration: 18m 39s)
- 08:25 _joe_: runing scap pull on mwdebug2001/2002
- 08:19 kartik@deploy2002: kartik, abi: Continuing with sync
- 08:13 kartik@deploy2002: kartik, abi: Backport for Translate: Enable message bundle Scribunto module on testwiki (T359918) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 08:07 kartik@deploy2002: Started scap sync-world: Backport for Translate: Enable message bundle Scribunto module on testwiki (T359918)
- 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json
- 08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
- 07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C
- 07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C
- 07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C
- 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C
- 07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B
- 07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B
- 07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
2024-11-06
- 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm
- 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm
- 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm
- 23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm
- 23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm
- 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm
- 23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
- 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm
- 23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm
- 23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage
- 23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
- 23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
- 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage
- 23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
- 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage
- 23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage
- 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
- 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage
- 23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage
- 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
- 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
- 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
- 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
- 22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
- 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm
- 22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm
- 22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm
- 22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm
- 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm
- 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm
- 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm
- 22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm
- 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155']
- 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154']
- 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153']
- 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152']
- 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151']
- 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151']
- 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152']
- 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153']
- 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154']
- 22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155']
- 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002"
- 22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002"
- 22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002"
- 22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002"
- 22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm
- 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm
- 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm
- 21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm
- 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm
- 21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
- 21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced]
- 21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
- 21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
- 21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
- 21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
- 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
- 20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
- 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
- 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
- 20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
- 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm
- 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm
- 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm
- 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm
- 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm
- 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150']
- 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149']
- 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148']
- 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147']
- 20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146']
- 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150']
- 20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149']
- 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148']
- 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147']
- 20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146']
- 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002"
- 20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002"
- 20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm
- 19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) (T375569)
- 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
- 18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:03 sukhe: dummy authdns-update to test CR 10857508
- 17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage
- 17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage
- 17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
- 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm
- 17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:17 hnowlan: importing debs for mercurius-1.0.1
- 17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
- 17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
- 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002"
- 17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002"
- 17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:32 moritzm: remove ganeti1014 from active ganeti nodes T378921
- 16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
- 16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
- 16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002"
- 16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002"
- 16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236
- 16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
- 15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s)
- 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002"
- 15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002"
- 15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0
- 15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work T358260
- 15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted
- 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted
- 15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
- 15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:31 moritzm: installing Linux 5.10.226 on bullseye hosts
- 15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236
- 15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" T379166
- 15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
- 15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Document available wbformatvalue options (T323778) (duration: 38m 45s)
- 15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet
- 15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
- 14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Document available wbformatvalue options (T323778) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:51 moritzm: installing php7.4 security updates
- 14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
- 14:48 moritzm: installing usb.ids updates from Bookworm point release
- 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
- 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046
- 14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046
- 14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Document available wbformatvalue options (T323778)
- 14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Cleanup for logo related file (duration: 15m 01s)
- 14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
- 14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
- 14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync
- 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
- 14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet
- 14:19 sukhe: depool cp2031
- 14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for Cleanup for logo related file synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
- 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Cleanup for logo related file
- 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045
- 14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045
- 14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
- 14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
- 13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
- 13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
- 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqia