Server Admin Log/Archive 87

2024-11-30

11:59 joal@deploy2002: Finished deploy [airflow-dags/analytics@fe37cfe]: Hotfix airflow analytics deploy [airflow-dags/analytics@fe37cfec] (duration: 01m 21s)
11:58 joal@deploy2002: Started deploy [airflow-dags/analytics@fe37cfe]: Hotfix airflow analytics deploy [airflow-dags/analytics@fe37cfec]

2024-11-29

16:55 jayme: puppet ca destroy mwmaint.discovery.wmnet - T341859
16:22 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2006.codfw.wmnet
16:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1006.eqiad.wmnet
16:15 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2006.codfw.wmnet
16:15 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1006.eqiad.wmnet
15:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2005.codfw.wmnet
15:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet
15:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet
15:34 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2005.codfw.wmnet
15:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
15:16 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
15:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P71448 and previous config saved to /var/cache/conftool/dbconfig/20241129-151101-ladsgroup.json
15:10 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
15:09 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
14:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P71447 and previous config saved to /var/cache/conftool/dbconfig/20241129-145554-ladsgroup.json
14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P71446 and previous config saved to /var/cache/conftool/dbconfig/20241129-144047-ladsgroup.json
14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P71445 and previous config saved to /var/cache/conftool/dbconfig/20241129-142540-ladsgroup.json
14:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T376905)', diff saved to https://phabricator.wikimedia.org/P71444 and previous config saved to /var/cache/conftool/dbconfig/20241129-141931-ladsgroup.json
14:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
14:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
14:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
14:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
14:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P71443 and previous config saved to /var/cache/conftool/dbconfig/20241129-141409-ladsgroup.json
13:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P71442 and previous config saved to /var/cache/conftool/dbconfig/20241129-135902-ladsgroup.json
13:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P71441 and previous config saved to /var/cache/conftool/dbconfig/20241129-134355-ladsgroup.json
13:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P71440 and previous config saved to /var/cache/conftool/dbconfig/20241129-132848-ladsgroup.json
13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1021.eqiad.wmnet
13:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:22 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T376905)', diff saved to https://phabricator.wikimedia.org/P71439 and previous config saved to /var/cache/conftool/dbconfig/20241129-132136-ladsgroup.json
13:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
13:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P71438 and previous config saved to /var/cache/conftool/dbconfig/20241129-132111-ladsgroup.json
13:17 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1021.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:13 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P71437 and previous config saved to /var/cache/conftool/dbconfig/20241129-130604-ladsgroup.json
13:06 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1021.eqiad.wmnet
12:57 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P71436 and previous config saved to /var/cache/conftool/dbconfig/20241129-125057-ladsgroup.json
12:42 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P71434 and previous config saved to /var/cache/conftool/dbconfig/20241129-123549-ladsgroup.json
12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1015.eqiad.wmnet
12:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T376905)', diff saved to https://phabricator.wikimedia.org/P71433 and previous config saved to /var/cache/conftool/dbconfig/20241129-122735-ladsgroup.json
12:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
12:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
12:27 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:21 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1015.eqiad.wmnet
12:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P71432 and previous config saved to /var/cache/conftool/dbconfig/20241129-121010-ladsgroup.json
12:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
12:04 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye
11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P71431 and previous config saved to /var/cache/conftool/dbconfig/20241129-115501-ladsgroup.json
11:44 moritzm: imported mapnik_4.0.3+ds2~wmf12u1 to component/maps T216826
11:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
11:40 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P71430 and previous config saved to /var/cache/conftool/dbconfig/20241129-113954-ladsgroup.json
11:31 Dreamy_Jazz: Started MediaModeration scanning scripts to scan all wikis
11:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye
11:27 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2084.codfw.wmnet with OS bullseye
11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P71429 and previous config saved to /var/cache/conftool/dbconfig/20241129-112447-ladsgroup.json
11:19 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:18 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T376905)', diff saved to https://phabricator.wikimedia.org/P71428 and previous config saved to /var/cache/conftool/dbconfig/20241129-111554-ladsgroup.json
11:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
11:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage
11:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
11:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
10:57 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2084.codfw.wmnet with reason: host reimage
10:45 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye
10:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
10:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
10:10 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
09:57 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
09:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
09:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
09:18 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
09:05 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
09:02 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
08:54 moritzm: imported mapbox-polylabel 2.0.1-1~wmf12u1 to component/maps T216826
08:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:16 moritzm: imported mapbox-geometry_2.0.3-1~wmf12u1 to component/maps T216826
07:19 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71427 and previous config saved to /var/cache/conftool/dbconfig/20241129-071905-root.json
07:10 aqu@deploy2002: Finished deploy [airflow-dags/analytics@656d6df]: Generate canary events faster in Airflow (duration: 03m 15s)
07:06 aqu@deploy2002: Started deploy [airflow-dags/analytics@656d6df]: Generate canary events faster in Airflow
07:03 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71426 and previous config saved to /var/cache/conftool/dbconfig/20241129-070333-root.json
06:48 marostegui@cumin2002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Repooling after corruption', diff saved to https://phabricator.wikimedia.org/P71425 and previous config saved to /var/cache/conftool/dbconfig/20241129-064801-root.json
06:28 marostegui@cumin2002: dbctl commit (dc=all): 'Repool', diff saved to https://phabricator.wikimedia.org/P71424 and previous config saved to /var/cache/conftool/dbconfig/20241129-062833-marostegui.json
06:27 marostegui@cumin2002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1223 quickly with 2 steps - Fixed corruption
06:26 marostegui@cumin2002: START - Cookbook sre.mysql.pool db1223 quickly with 2 steps - Fixed corruption
05:52 taavi@cumin1002: dbctl commit (dc=all): 'depool db1223, replication broken', diff saved to https://phabricator.wikimedia.org/P71423 and previous config saved to /var/cache/conftool/dbconfig/20241129-055245-taavi.json
04:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T328817)', diff saved to https://phabricator.wikimedia.org/P71422 and previous config saved to /var/cache/conftool/dbconfig/20241129-045409-ladsgroup.json
04:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P71421 and previous config saved to /var/cache/conftool/dbconfig/20241129-043902-ladsgroup.json
04:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P71420 and previous config saved to /var/cache/conftool/dbconfig/20241129-042355-ladsgroup.json
{{safesubst:SAL entry|1=04:20 tstarling@deploy2002: Finished scap sync-world: Backport for addWiki.php tweaks, Run dumpInterwiki.php locally with no changes, Prepare id.wikivoyage.org for installation (T380726 T352113), dumpInterwiki: read from preinstall.dblist (T352113), addWiki: Move DB_ADMIN to core, [[gerrit:1099064|addWiki: Add UpdateSearchIndexCon}}
04:12 tstarling@deploy2002: tstarling: Continuing with sync
{{safesubst:SAL entry|1=04:12 tstarling@deploy2002: tstarling: Backport for addWiki.php tweaks, Run dumpInterwiki.php locally with no changes, Prepare id.wikivoyage.org for installation (T380726 T352113), dumpInterwiki: read from preinstall.dblist (T352113), addWiki: Move DB_ADMIN to core, addWiki: Add UpdateSearchIndexConfig, [[gerrit}}
04:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T328817)', diff saved to https://phabricator.wikimedia.org/P71419 and previous config saved to /var/cache/conftool/dbconfig/20241129-040846-ladsgroup.json
04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2229 (T328817)', diff saved to https://phabricator.wikimedia.org/P71418 and previous config saved to /var/cache/conftool/dbconfig/20241129-040547-ladsgroup.json
04:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance
04:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2229.codfw.wmnet with reason: Maintenance
04:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T328817)', diff saved to https://phabricator.wikimedia.org/P71417 and previous config saved to /var/cache/conftool/dbconfig/20241129-040523-ladsgroup.json
{{safesubst:SAL entry|1=04:01 tstarling@deploy2002: Started scap sync-world: Backport for addWiki.php tweaks, Run dumpInterwiki.php locally with no changes, Prepare id.wikivoyage.org for installation (T380726 T352113), dumpInterwiki: read from preinstall.dblist (T352113), addWiki: Move DB_ADMIN to core, [[gerrit:1099064|addWiki: Add UpdateSearchIndexConf}}
03:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P71416 and previous config saved to /var/cache/conftool/dbconfig/20241129-035016-ladsgroup.json
03:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P71415 and previous config saved to /var/cache/conftool/dbconfig/20241129-033509-ladsgroup.json
03:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T328817)', diff saved to https://phabricator.wikimedia.org/P71414 and previous config saved to /var/cache/conftool/dbconfig/20241129-032002-ladsgroup.json
03:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T328817)', diff saved to https://phabricator.wikimedia.org/P71413 and previous config saved to /var/cache/conftool/dbconfig/20241129-031705-ladsgroup.json
03:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance
03:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance
03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T328817)', diff saved to https://phabricator.wikimedia.org/P71412 and previous config saved to /var/cache/conftool/dbconfig/20241129-031642-ladsgroup.json
03:04 tstarling@deploy2002: scap failed: <KeyError> '1 dbs from /srv/mediawiki-staging/wikiversions.json are missing from /srv/mediawiki-staging/dblists/all.dblist: idwikivoyage' (scap version: 4.129.0) (duration: 00m 00s)
03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P71411 and previous config saved to /var/cache/conftool/dbconfig/20241129-030133-ladsgroup.json
02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P71410 and previous config saved to /var/cache/conftool/dbconfig/20241129-024625-ladsgroup.json
02:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T328817)', diff saved to https://phabricator.wikimedia.org/P71409 and previous config saved to /var/cache/conftool/dbconfig/20241129-023118-ladsgroup.json
02:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T328817)', diff saved to https://phabricator.wikimedia.org/P71408 and previous config saved to /var/cache/conftool/dbconfig/20241129-022822-ladsgroup.json
02:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
02:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
02:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
02:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
02:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T328817)', diff saved to https://phabricator.wikimedia.org/P71407 and previous config saved to /var/cache/conftool/dbconfig/20241129-022645-ladsgroup.json
02:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P71406 and previous config saved to /var/cache/conftool/dbconfig/20241129-021138-ladsgroup.json
01:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P71405 and previous config saved to /var/cache/conftool/dbconfig/20241129-015631-ladsgroup.json
01:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T328817)', diff saved to https://phabricator.wikimedia.org/P71404 and previous config saved to /var/cache/conftool/dbconfig/20241129-014124-ladsgroup.json
01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T328817)', diff saved to https://phabricator.wikimedia.org/P71403 and previous config saved to /var/cache/conftool/dbconfig/20241129-013912-ladsgroup.json
01:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
01:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
01:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T328817)', diff saved to https://phabricator.wikimedia.org/P71402 and previous config saved to /var/cache/conftool/dbconfig/20241129-013850-ladsgroup.json
01:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P71401 and previous config saved to /var/cache/conftool/dbconfig/20241129-012343-ladsgroup.json
01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P71400 and previous config saved to /var/cache/conftool/dbconfig/20241129-010835-ladsgroup.json
00:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T328817)', diff saved to https://phabricator.wikimedia.org/P71399 and previous config saved to /var/cache/conftool/dbconfig/20241129-005328-ladsgroup.json
00:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T328817)', diff saved to https://phabricator.wikimedia.org/P71398 and previous config saved to /var/cache/conftool/dbconfig/20241129-005117-ladsgroup.json
00:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
00:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T328817)', diff saved to https://phabricator.wikimedia.org/P71397 and previous config saved to /var/cache/conftool/dbconfig/20241129-005054-ladsgroup.json
00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P71396 and previous config saved to /var/cache/conftool/dbconfig/20241129-003547-ladsgroup.json
00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P71395 and previous config saved to /var/cache/conftool/dbconfig/20241129-002040-ladsgroup.json
00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T328817)', diff saved to https://phabricator.wikimedia.org/P71394 and previous config saved to /var/cache/conftool/dbconfig/20241129-000533-ladsgroup.json
00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T328817)', diff saved to https://phabricator.wikimedia.org/P71393 and previous config saved to /var/cache/conftool/dbconfig/20241129-000234-ladsgroup.json
00:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
00:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
00:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T328817)', diff saved to https://phabricator.wikimedia.org/P71392 and previous config saved to /var/cache/conftool/dbconfig/20241129-000211-ladsgroup.json

2024-11-28

23:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P71391 and previous config saved to /var/cache/conftool/dbconfig/20241128-234704-ladsgroup.json
23:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P71390 and previous config saved to /var/cache/conftool/dbconfig/20241128-233426-ladsgroup.json
23:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P71389 and previous config saved to /var/cache/conftool/dbconfig/20241128-233157-ladsgroup.json
23:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P71388 and previous config saved to /var/cache/conftool/dbconfig/20241128-231919-ladsgroup.json
23:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T328817)', diff saved to https://phabricator.wikimedia.org/P71387 and previous config saved to /var/cache/conftool/dbconfig/20241128-231650-ladsgroup.json
23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T328817)', diff saved to https://phabricator.wikimedia.org/P71386 and previous config saved to /var/cache/conftool/dbconfig/20241128-231350-ladsgroup.json
23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
23:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
23:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
23:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T328817)', diff saved to https://phabricator.wikimedia.org/P71385 and previous config saved to /var/cache/conftool/dbconfig/20241128-231312-ladsgroup.json
23:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P71384 and previous config saved to /var/cache/conftool/dbconfig/20241128-230412-ladsgroup.json
22:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P71383 and previous config saved to /var/cache/conftool/dbconfig/20241128-225805-ladsgroup.json
22:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1178 gradually with 4 steps - Maint over (T361627)
22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P71381 and previous config saved to /var/cache/conftool/dbconfig/20241128-224905-ladsgroup.json
22:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P71380 and previous config saved to /var/cache/conftool/dbconfig/20241128-224258-ladsgroup.json
22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T376905)', diff saved to https://phabricator.wikimedia.org/P71379 and previous config saved to /var/cache/conftool/dbconfig/20241128-223959-ladsgroup.json
22:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
22:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
22:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
22:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
22:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T328817)', diff saved to https://phabricator.wikimedia.org/P71377 and previous config saved to /var/cache/conftool/dbconfig/20241128-222751-ladsgroup.json
22:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T328817)', diff saved to https://phabricator.wikimedia.org/P71376 and previous config saved to /var/cache/conftool/dbconfig/20241128-222250-ladsgroup.json
22:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
22:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
away: UTC late deploys done
{{safesubst:SAL entry|1=22:17 tgr@deploy2002: Finished scap sync-world: Backport for Localisation updates (November 26) (T372175), extend account creation lookup service to cover forced creations by others (T378401), extend account creation backfill script to forced account creations by others (T378401), [[gerrit:1098929|ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot depl}}
22:07 tgr@deploy2002: tgr, ariel, matmarex, mszabo: Continuing with sync
22:05 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1178 gradually with 4 steps - Maint over (T361627)
{{safesubst:SAL entry|1=21:53 tgr@deploy2002: tgr, ariel, matmarex, mszabo: Backport for Localisation updates (November 26) (T372175), extend account creation lookup service to cover forced creations by others (T378401), extend account creation backfill script to forced account creations by others (T378401), [[gerrit:1098929|ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot}}
21:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1178.eqiad.wmnet with reason: Schema change (T361627)
21:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1178.eqiad.wmnet with reason: Schema change (T361627)
21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1178 depool (T361627)', diff saved to https://phabricator.wikimedia.org/P71373 and previous config saved to /var/cache/conftool/dbconfig/20241128-215026-ladsgroup.json
{{safesubst:SAL entry|1=21:39 tgr@deploy2002: Started scap sync-world: Backport for Localisation updates (November 26) (T372175), extend account creation lookup service to cover forced creations by others (T378401), extend account creation backfill script to forced account creations by others (T378401), [[gerrit:1098929|ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deplo}}
21:25 tgr@deploy2002: Finished scap sync-world: Backport for Reader Survey: Undeploy on enwiki (T378660), Reader Survey: Deploy on multiple wikis (T378660) (duration: 14m 43s)
21:18 tgr@deploy2002: tgr, dani: Continuing with sync
21:17 aqu@deploy2002: Finished deploy [airflow-dags/analytics@6d38940]: Generate canary events faster in Airflow (duration: 01m 39s)
21:16 tgr@deploy2002: tgr, dani: Backport for Reader Survey: Undeploy on enwiki (T378660), Reader Survey: Deploy on multiple wikis (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:15 aqu@deploy2002: Started deploy [airflow-dags/analytics@6d38940]: Generate canary events faster in Airflow
21:10 tgr@deploy2002: Started scap sync-world: Backport for Reader Survey: Undeploy on enwiki (T378660), Reader Survey: Deploy on multiple wikis (T378660)
20:30 kharlan@deploy2002: Finished scap sync-world: Backport for ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277) (duration: 13m 08s)
20:23 kharlan@deploy2002: kharlan, mszabo: Continuing with sync
20:23 kharlan@deploy2002: kharlan, mszabo: Backport for ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:16 kharlan@deploy2002: Started scap sync-world: Backport for ReportIncident: Setup $wgReportIncidentLocalLinks for ptwiki pilot deploy (T380277)
19:50 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.roll-reimage-nodes (exit_code=0) rolling reimage on P{wikikube-worker[1276-1277].eqiad.wmnet} and (A:wikikube-staging-master-codfw or A:wikikube-staging-worker-codfw or A:wikikube-staging-master-eqiad or A:wikikube-staging-worker-eqiad or A:wikikube-master-codfw or A:wikikube-worker-codfw or A:wikikube-master-eqiad or A:wikikube-worker-eqiad or A:ml-serve-master-eqiad or
19:50 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1277.eqiad.wmnet with OS bookworm
19:31 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage
19:27 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1277.eqiad.wmnet with reason: host reimage
19:08 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1277.eqiad.wmnet with OS bookworm
18:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1276.eqiad.wmnet with OS bookworm
18:09 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage
18:06 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1276.eqiad.wmnet with reason: host reimage
17:47 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1276.eqiad.wmnet with OS bookworm
17:45 kamila@cumin1002: START - Cookbook sre.k8s.roll-reimage-nodes rolling reimage on P{wikikube-worker[1276-1277].eqiad.wmnet} and (A:wikikube-staging-master-codfw or A:wikikube-staging-worker-codfw or A:wikikube-staging-master-eqiad or A:wikikube-staging-worker-eqiad or A:wikikube-master-codfw or A:wikikube-worker-codfw or A:wikikube-master-eqiad or A:wikikube-worker-eqiad or A:ml-serve-master-eqiad or A:ml-serve-worker-
17:06 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye
16:51 Emperor: depool/restart swift/repool ms-fe2014
16:51 Emperor: depool/restart swift/repool ms-fe2009
16:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
16:41 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
16:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2209.codfw.wmnet with reason: Maintenance
16:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2209.codfw.wmnet with reason: Maintenance
16:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1189.eqiad.wmnet with reason: Maintenance
16:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1189.eqiad.wmnet with reason: Maintenance
16:28 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
16:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2227.codfw.wmnet with reason: Maintenance
16:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2227.codfw.wmnet with reason: Maintenance
16:24 gmodena@deploy2002: Finished deploy [airflow-dags/analytics@d7c0f58]: webrequest_frontend post deployment fixes (duration: 02m 22s)
16:24 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2088.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2088.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:23 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:23 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:22 gmodena@deploy2002: Started deploy [airflow-dags/analytics@d7c0f58]: webrequest_frontend post deployment fixes
16:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:21 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:19 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2205.codfw.wmnet with reason: Maintenance
16:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2205.codfw.wmnet with reason: Maintenance
16:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2194.codfw.wmnet with reason: Maintenance
16:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2194.codfw.wmnet with reason: Maintenance
16:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
16:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
15:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2190.codfw.wmnet with reason: Maintenance
15:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2190.codfw.wmnet with reason: Maintenance
15:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:46 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host idp-test2004.wikimedia.org
15:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
15:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2186.codfw.wmnet with reason: Maintenance
15:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2156.codfw.wmnet with reason: Maintenance
15:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2156.codfw.wmnet with reason: Maintenance
15:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2004.wikimedia.org
15:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2005.wikimedia.org
15:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test2005.wikimedia.org
15:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032 (T376905)', diff saved to https://phabricator.wikimedia.org/P71371 and previous config saved to /var/cache/conftool/dbconfig/20241128-153202-ladsgroup.json
15:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2149.codfw.wmnet with reason: Maintenance
15:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2149.codfw.wmnet with reason: Maintenance
15:27 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303] (hadoop-test): Gobblin config changes [analytics/refinery@ac873037] (duration: 00m 26s)
15:26 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303] (hadoop-test): Gobblin config changes [analytics/refinery@ac873037]
15:25 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303] (thin): Gobblin config changes THIN [analytics/refinery@ac873037] (duration: 00m 30s)
15:25 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303] (thin): Gobblin config changes THIN [analytics/refinery@ac873037]
15:21 moritzm: removing ganeti1018 from active Ganeti nodes T378921
15:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2139.codfw.wmnet with reason: Maintenance
15:20 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: sync
15:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2139.codfw.wmnet with reason: Maintenance
15:19 gmodena@deploy2002: Finished deploy [analytics/refinery@ac87303]: Gobblin config changes [analytics/refinery@ac873037] (duration: 03m 05s)
15:19 elukey@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: sync
15:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
15:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P71370 and previous config saved to /var/cache/conftool/dbconfig/20241128-151655-ladsgroup.json
15:16 gmodena@deploy2002: Started deploy [analytics/refinery@ac87303]: Gobblin config changes [analytics/refinery@ac873037]
15:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
15:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance
15:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1240.eqiad.wmnet with reason: Maintenance
15:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1223.eqiad.wmnet with reason: Maintenance
15:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1223.eqiad.wmnet with reason: Maintenance
15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1212.eqiad.wmnet with reason: Maintenance
15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1212.eqiad.wmnet with reason: Maintenance
15:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1198.eqiad.wmnet with reason: Maintenance
15:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1198.eqiad.wmnet with reason: Maintenance
15:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1175.eqiad.wmnet with reason: Maintenance
15:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1175.eqiad.wmnet with reason: Maintenance
15:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1166.eqiad.wmnet with reason: Maintenance
15:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1166.eqiad.wmnet with reason: Maintenance
15:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1157.eqiad.wmnet with reason: Maintenance
15:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1157.eqiad.wmnet with reason: Maintenance
15:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032', diff saved to https://phabricator.wikimedia.org/P71369 and previous config saved to /var/cache/conftool/dbconfig/20241128-150148-ladsgroup.json
15:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
15:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1150.eqiad.wmnet with reason: Maintenance
15:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2214.codfw.wmnet with reason: Maintenance
14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2214.codfw.wmnet with reason: Maintenance
14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1201.eqiad.wmnet with reason: Maintenance
14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1201.eqiad.wmnet with reason: Maintenance
14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2229.codfw.wmnet with reason: Maintenance
14:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2229.codfw.wmnet with reason: Maintenance
14:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2224.codfw.wmnet with reason: Maintenance
14:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2224.codfw.wmnet with reason: Maintenance
14:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2217.codfw.wmnet with reason: Maintenance
14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2217.codfw.wmnet with reason: Maintenance
14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance
14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2197.codfw.wmnet with reason: Maintenance
14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2193.codfw.wmnet with reason: Maintenance
14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2193.codfw.wmnet with reason: Maintenance
14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2180.codfw.wmnet with reason: Maintenance
14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2180.codfw.wmnet with reason: Maintenance
14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2187.codfw.wmnet with reason: Maintenance
14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2158.codfw.wmnet with reason: Maintenance
14:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2158.codfw.wmnet with reason: Maintenance
14:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2151.codfw.wmnet with reason: Maintenance
14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2151.codfw.wmnet with reason: Maintenance
14:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
14:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1231.eqiad.wmnet with reason: Maintenance
{{safesubst:SAL entry|1=14:54 urbanecm@deploy2002: Finished scap sync-world: Backport for Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788), ReportIncident: Enable instrumentation on labs (T372823), Enable message group subscription feature for some wikis (T372386), [[gerrit:1098622|Use `useformat` query param for device detection or mobile domain (m.)}}
14:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1231.eqiad.wmnet with reason: Maintenance
14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance
14:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1225.eqiad.wmnet with reason: Maintenance
14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1187.eqiad.wmnet with reason: Maintenance
14:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1187.eqiad.wmnet with reason: Maintenance
14:53 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1180.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1180.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1173.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1173.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1168.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1168.eqiad.wmnet with reason: Maintenance
14:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
14:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
14:51 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1165.eqiad.wmnet with reason: Maintenance
14:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1165.eqiad.wmnet with reason: Maintenance
14:47 urbanecm@deploy2002: urbanecm, tgr, abi, mszabo: Continuing with sync
14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2032 (T376905)', diff saved to https://phabricator.wikimedia.org/P71352 and previous config saved to /var/cache/conftool/dbconfig/20241128-144641-ladsgroup.json
14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2032 (T376905)', diff saved to https://phabricator.wikimedia.org/P71351 and previous config saved to /var/cache/conftool/dbconfig/20241128-144039-ladsgroup.json
14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2032.codfw.wmnet with reason: Maintenance
14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T376905)', diff saved to https://phabricator.wikimedia.org/P71350 and previous config saved to /var/cache/conftool/dbconfig/20241128-144012-ladsgroup.json
14:39 urbanecm: [urbanecm@deploy2002 ~]$ while read wiki; do echo "== $wiki"; mwscript-k8s extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=$wiki; done < wikis.txt # wikis.txt is at P71349 # T378827
14:36 urbanecm: [urbanecm@deploy2002 ~]$ mwscript-k8s -f extensions/Flow/maintenance/FlowMoveBoardsToSubpages.php -- --wiki=bswiki # T378827
14:33 moritzm: installing node-es-module-lexer updates from Bookworm point release
{{safesubst:SAL entry|1=14:28 urbanecm@deploy2002: urbanecm, tgr, abi, mszabo: Backport for Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788), ReportIncident: Enable instrumentation on labs (T372823), Enable message group subscription feature for some wikis (T372386), [[gerrit:1098622|Use `useformat` query param for device detection or mobile domain (m.}}
14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P71347 and previous config saved to /var/cache/conftool/dbconfig/20241128-142505-ladsgroup.json
14:25 Dreamy_Jazz: Started MediaModeration scanning scripts to run again over all wikis
{{safesubst:SAL entry|1=14:23 urbanecm@deploy2002: Started scap sync-world: Backport for Use `useformat` query param for device detection or mobile domain (m.) (T380646 T375788), ReportIncident: Enable instrumentation on labs (T372823), Enable message group subscription feature for some wikis (T372386), [[gerrit:1098622|Use `useformat` query param for device detection or mobile domain (m.) (}}
14:22 urbanecm@deploy2002: Finished scap sync-world: Backport for Allow IRS to record server-side interaction events (T380599), Revert^2 "Add contact form for U4C" (duration: 14m 07s)
14:22 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
14:15 urbanecm@deploy2002: nmw03, mszabo, urbanecm: Continuing with sync
14:14 moritzm: installing apr security updates
14:14 urbanecm@deploy2002: nmw03, mszabo, urbanecm: Backport for Allow IRS to record server-side interaction events (T380599), Revert^2 "Add contact form for U4C" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028', diff saved to https://phabricator.wikimedia.org/P71346 and previous config saved to /var/cache/conftool/dbconfig/20241128-140958-ladsgroup.json
14:08 urbanecm@deploy2002: Started scap sync-world: Backport for Allow IRS to record server-side interaction events (T380599), Revert^2 "Add contact form for U4C"
14:06 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2028 (T376905)', diff saved to https://phabricator.wikimedia.org/P71345 and previous config saved to /var/cache/conftool/dbconfig/20241128-135451-ladsgroup.json
13:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2028 (T376905)', diff saved to https://phabricator.wikimedia.org/P71344 and previous config saved to /var/cache/conftool/dbconfig/20241128-134859-ladsgroup.json
13:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
13:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2028.codfw.wmnet with reason: Maintenance
12:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031 (T376905)', diff saved to https://phabricator.wikimedia.org/P71343 and previous config saved to /var/cache/conftool/dbconfig/20241128-124957-ladsgroup.json
12:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P71342 and previous config saved to /var/cache/conftool/dbconfig/20241128-123451-ladsgroup.json
12:23 klausman@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031', diff saved to https://phabricator.wikimedia.org/P71340 and previous config saved to /var/cache/conftool/dbconfig/20241128-121943-ladsgroup.json
12:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2031 (T376905)', diff saved to https://phabricator.wikimedia.org/P71339 and previous config saved to /var/cache/conftool/dbconfig/20241128-120437-ladsgroup.json
12:04 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump ratio of new parsercache key spec to 2 (T373037) (duration: 12m 37s)
12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T370903)', diff saved to https://phabricator.wikimedia.org/P71338 and previous config saved to /var/cache/conftool/dbconfig/20241128-120031-ladsgroup.json
11:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2031 (T376905)', diff saved to https://phabricator.wikimedia.org/P71337 and previous config saved to /var/cache/conftool/dbconfig/20241128-115741-ladsgroup.json
11:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance
11:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2031.codfw.wmnet with reason: Maintenance
11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033 (T376905)', diff saved to https://phabricator.wikimedia.org/P71336 and previous config saved to /var/cache/conftool/dbconfig/20241128-115715-ladsgroup.json
11:57 ladsgroup@deploy2002: ladsgroup: Backport for Bump ratio of new parsercache key spec to 2 (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:51 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump ratio of new parsercache key spec to 2 (T373037)
11:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2237 gradually with 4 steps - Maint over (T379813)
11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P71334 and previous config saved to /var/cache/conftool/dbconfig/20241128-114524-ladsgroup.json
11:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P71333 and previous config saved to /var/cache/conftool/dbconfig/20241128-114208-ladsgroup.json
11:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P71330 and previous config saved to /var/cache/conftool/dbconfig/20241128-113017-ladsgroup.json
11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033', diff saved to https://phabricator.wikimedia.org/P71329 and previous config saved to /var/cache/conftool/dbconfig/20241128-112701-ladsgroup.json
11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T370903)', diff saved to https://phabricator.wikimedia.org/P71327 and previous config saved to /var/cache/conftool/dbconfig/20241128-111510-ladsgroup.json
11:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T370903)', diff saved to https://phabricator.wikimedia.org/P71326 and previous config saved to /var/cache/conftool/dbconfig/20241128-111300-ladsgroup.json
11:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance
11:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2033 (T376905)', diff saved to https://phabricator.wikimedia.org/P71325 and previous config saved to /var/cache/conftool/dbconfig/20241128-111154-ladsgroup.json
11:11 moritzm: removing ganeti1022 from active Ganeti nodes T378921
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
11:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance
11:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance
11:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2204.codfw.wmnet with reason: Maintenance
11:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2204.codfw.wmnet with reason: Maintenance
11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2033 (T376905)', diff saved to https://phabricator.wikimedia.org/P71324 and previous config saved to /var/cache/conftool/dbconfig/20241128-110457-ladsgroup.json
11:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
11:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2033.codfw.wmnet with reason: Maintenance
11:03 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2237 gradually with 4 steps - Maint over (T379813)
10:51 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix commit bug - oblivian@cumin1002"
10:51 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix commit bug - oblivian@cumin1002
10:51 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix commit bug - oblivian@cumin1002
10:51 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix commit bug - oblivian@cumin1002"
10:32 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:27 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:36 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases1003.eqiad.wmnet (duration: 01m 22s)
09:35 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases1003.eqiad.wmnet
09:31 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases2003.codfw.wmnet (duration: 01m 27s)
09:30 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Update Jenkins version on releases2003.codfw.wmnet
09:23 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:22 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.5 refs T375664
09:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance
09:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2200.codfw.wmnet with reason: Maintenance
09:09 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:06 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance
09:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2198.codfw.wmnet with reason: Maintenance
09:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T370903)', diff saved to https://phabricator.wikimedia.org/P71319 and previous config saved to /var/cache/conftool/dbconfig/20241128-090035-ladsgroup.json
08:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P71318 and previous config saved to /var/cache/conftool/dbconfig/20241128-084528-ladsgroup.json
08:43 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
08:41 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
08:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P71317 and previous config saved to /var/cache/conftool/dbconfig/20241128-083021-ladsgroup.json
08:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T370903)', diff saved to https://phabricator.wikimedia.org/P71316 and previous config saved to /var/cache/conftool/dbconfig/20241128-081514-ladsgroup.json
08:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T370903)', diff saved to https://phabricator.wikimedia.org/P71315 and previous config saved to /var/cache/conftool/dbconfig/20241128-080244-ladsgroup.json
08:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2195.codfw.wmnet with reason: Maintenance
08:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2195.codfw.wmnet with reason: Maintenance
08:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T370903)', diff saved to https://phabricator.wikimedia.org/P71314 and previous config saved to /var/cache/conftool/dbconfig/20241128-080221-ladsgroup.json
07:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
07:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P71313 and previous config saved to /var/cache/conftool/dbconfig/20241128-074714-ladsgroup.json
07:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P71312 and previous config saved to /var/cache/conftool/dbconfig/20241128-073207-ladsgroup.json
07:23 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "CSRF token support - oblivian@cumin1002"
07:23 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: CSRF token support - oblivian@cumin1002
07:23 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: CSRF token support - oblivian@cumin1002
07:22 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "CSRF token support - oblivian@cumin1002"
07:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T370903)', diff saved to https://phabricator.wikimedia.org/P71310 and previous config saved to /var/cache/conftool/dbconfig/20241128-071700-ladsgroup.json
07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T370903)', diff saved to https://phabricator.wikimedia.org/P71309 and previous config saved to /var/cache/conftool/dbconfig/20241128-070231-ladsgroup.json
07:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance
07:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance
07:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T370903)', diff saved to https://phabricator.wikimedia.org/P71308 and previous config saved to /var/cache/conftool/dbconfig/20241128-070209-ladsgroup.json
07:02 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P71307 and previous config saved to /var/cache/conftool/dbconfig/20241128-064702-ladsgroup.json
06:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P71306 and previous config saved to /var/cache/conftool/dbconfig/20241128-063155-ladsgroup.json
06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T370903)', diff saved to https://phabricator.wikimedia.org/P71305 and previous config saved to /var/cache/conftool/dbconfig/20241128-061647-ladsgroup.json
06:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T370903)', diff saved to https://phabricator.wikimedia.org/P71304 and previous config saved to /var/cache/conftool/dbconfig/20241128-060418-ladsgroup.json
06:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
06:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
06:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T370903)', diff saved to https://phabricator.wikimedia.org/P71303 and previous config saved to /var/cache/conftool/dbconfig/20241128-060355-ladsgroup.json
05:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P71302 and previous config saved to /var/cache/conftool/dbconfig/20241128-054847-ladsgroup.json
05:48 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
05:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P71301 and previous config saved to /var/cache/conftool/dbconfig/20241128-053340-ladsgroup.json
05:29 tstarling@deploy2002: Finished scap sync-world: Backport for Add frwiki on labs for new addWiki.php test (duration: 13m 41s)
05:23 tstarling@deploy2002: tstarling: Continuing with sync
05:22 tstarling@deploy2002: tstarling: Backport for Add frwiki on labs for new addWiki.php test synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
05:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T370903)', diff saved to https://phabricator.wikimedia.org/P71300 and previous config saved to /var/cache/conftool/dbconfig/20241128-051833-ladsgroup.json
05:16 tstarling@deploy2002: Started scap sync-world: Backport for Add frwiki on labs for new addWiki.php test
05:06 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
05:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T370903)', diff saved to https://phabricator.wikimedia.org/P71299 and previous config saved to /var/cache/conftool/dbconfig/20241128-050352-ladsgroup.json
05:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance
05:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance
05:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T370903)', diff saved to https://phabricator.wikimedia.org/P71298 and previous config saved to /var/cache/conftool/dbconfig/20241128-050329-ladsgroup.json
04:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P71297 and previous config saved to /var/cache/conftool/dbconfig/20241128-044822-ladsgroup.json
04:41 eileen: civicrm upgraded from ed67a1b2 to be7e5d33
04:36 eileen: * civicrm upgraded from 40f4f1a3 to ed67a1b2
04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P71296 and previous config saved to /var/cache/conftool/dbconfig/20241128-043314-ladsgroup.json
04:26 eileen: * civicrm upgraded from 7ade5fd7 to 40f4f1a3
04:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T370903)', diff saved to https://phabricator.wikimedia.org/P71294 and previous config saved to /var/cache/conftool/dbconfig/20241128-041807-ladsgroup.json
04:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T370903)', diff saved to https://phabricator.wikimedia.org/P71292 and previous config saved to /var/cache/conftool/dbconfig/20241128-040326-ladsgroup.json
04:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:03 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance
04:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance
04:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T370903)', diff saved to https://phabricator.wikimedia.org/P71291 and previous config saved to /var/cache/conftool/dbconfig/20241128-040248-ladsgroup.json
03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P71290 and previous config saved to /var/cache/conftool/dbconfig/20241128-034741-ladsgroup.json
03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P71289 and previous config saved to /var/cache/conftool/dbconfig/20241128-033234-ladsgroup.json
03:22 eileen: config revision changed from f284fd46 to a3175f86 (like for real this time)
03:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T370903)', diff saved to https://phabricator.wikimedia.org/P71288 and previous config saved to /var/cache/conftool/dbconfig/20241128-031726-ladsgroup.json
03:14 eileen: onfig revision changed from f284fd46 to a3175f86
03:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T370903)', diff saved to https://phabricator.wikimedia.org/P71287 and previous config saved to /var/cache/conftool/dbconfig/20241128-030213-ladsgroup.json
03:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance
03:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance
03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T370903)', diff saved to https://phabricator.wikimedia.org/P71286 and previous config saved to /var/cache/conftool/dbconfig/20241128-030151-ladsgroup.json
02:53 eileen: civicrm upgraded from c8c461b9 to 7ade5fd7
02:46 eileen: * civicrm upgraded from 80f03357 to c8c461b9
02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P71285 and previous config saved to /var/cache/conftool/dbconfig/20241128-024644-ladsgroup.json
02:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P71284 and previous config saved to /var/cache/conftool/dbconfig/20241128-023136-ladsgroup.json
02:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T370903)', diff saved to https://phabricator.wikimedia.org/P71283 and previous config saved to /var/cache/conftool/dbconfig/20241128-021629-ladsgroup.json
02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T370903)', diff saved to https://phabricator.wikimedia.org/P71282 and previous config saved to /var/cache/conftool/dbconfig/20241128-020143-ladsgroup.json
02:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance
02:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance
02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T370903)', diff saved to https://phabricator.wikimedia.org/P71281 and previous config saved to /var/cache/conftool/dbconfig/20241128-020120-ladsgroup.json
01:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P71280 and previous config saved to /var/cache/conftool/dbconfig/20241128-014613-ladsgroup.json
01:38 eileen: civicrm upgraded from 3b1ed162 to 80f03357
01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P71279 and previous config saved to /var/cache/conftool/dbconfig/20241128-013106-ladsgroup.json
01:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T370903)', diff saved to https://phabricator.wikimedia.org/P71278 and previous config saved to /var/cache/conftool/dbconfig/20241128-011559-ladsgroup.json
01:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T370903)', diff saved to https://phabricator.wikimedia.org/P71277 and previous config saved to /var/cache/conftool/dbconfig/20241128-010112-ladsgroup.json
01:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance
01:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance
01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T370903)', diff saved to https://phabricator.wikimedia.org/P71276 and previous config saved to /var/cache/conftool/dbconfig/20241128-010049-ladsgroup.json
00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P71275 and previous config saved to /var/cache/conftool/dbconfig/20241128-004542-ladsgroup.json
00:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P71274 and previous config saved to /var/cache/conftool/dbconfig/20241128-003035-ladsgroup.json
00:16 tstarling@deploy2002: Finished scap sync-world: Backport for Move default main page text for new wikis to config (T352113), Introduce preinstall.dblist for wikis that haven't been installed yet (T352113) (duration: 14m 42s)
00:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T370903)', diff saved to https://phabricator.wikimedia.org/P71273 and previous config saved to /var/cache/conftool/dbconfig/20241128-001528-ladsgroup.json
00:09 tstarling@deploy2002: tstarling: Continuing with sync
00:07 tstarling@deploy2002: tstarling: Backport for Move default main page text for new wikis to config (T352113), Introduce preinstall.dblist for wikis that haven't been installed yet (T352113) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
00:01 tstarling@deploy2002: Started scap sync-world: Backport for Move default main page text for new wikis to config (T352113), Introduce preinstall.dblist for wikis that haven't been installed yet (T352113)
00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T370903)', diff saved to https://phabricator.wikimedia.org/P71272 and previous config saved to /var/cache/conftool/dbconfig/20241128-000046-ladsgroup.json
00:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance
00:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance
00:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T370903)', diff saved to https://phabricator.wikimedia.org/P71271 and previous config saved to /var/cache/conftool/dbconfig/20241128-000023-ladsgroup.json

2024-11-27

23:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P71270 and previous config saved to /var/cache/conftool/dbconfig/20241127-234518-ladsgroup.json
23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P71269 and previous config saved to /var/cache/conftool/dbconfig/20241127-233011-ladsgroup.json
23:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T370903)', diff saved to https://phabricator.wikimedia.org/P71267 and previous config saved to /var/cache/conftool/dbconfig/20241127-231504-ladsgroup.json
23:09 tgr@deploy2002: Finished scap sync-world: Backport for Fix mobile domain logic for login.wikimedia.org (T380646) (duration: 18m 07s)
23:02 tgr@deploy2002: tgr: Continuing with sync
23:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T370903)', diff saved to https://phabricator.wikimedia.org/P71264 and previous config saved to /var/cache/conftool/dbconfig/20241127-230159-ladsgroup.json
23:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance
23:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance
22:56 tgr@deploy2002: tgr: Backport for Fix mobile domain logic for login.wikimedia.org (T380646) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
22:52 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
22:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T370903)', diff saved to https://phabricator.wikimedia.org/P71263 and previous config saved to /var/cache/conftool/dbconfig/20241127-225159-ladsgroup.json
22:51 tgr@deploy2002: Started scap sync-world: Backport for Fix mobile domain logic for login.wikimedia.org (T380646)
22:46 cjming: end of UTC late backport window
22:44 cjming@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Read views on jawikivoyage (T380769) (duration: 15m 22s)
22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P71262 and previous config saved to /var/cache/conftool/dbconfig/20241127-223652-ladsgroup.json
22:35 cjming@deploy2002: cscott, cjming: Continuing with sync
22:35 cjming@deploy2002: cscott, cjming: Backport for Turn on Parsoid Read views on jawikivoyage (T380769) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:29 cjming@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Read views on jawikivoyage (T380769)
22:27 cjming@deploy2002: Finished scap sync-world: Backport for Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664), Bump wikimedia/parsoid to 0.21.0-a9 (T380664) (duration: 42m 38s)
22:26 bking@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin1002"
22:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P71261 and previous config saved to /var/cache/conftool/dbconfig/20241127-222145-ladsgroup.json
22:13 cjming@deploy2002: arlolra, cjming: Continuing with sync
22:11 bking@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage
22:09 cjming@deploy2002: arlolra, cjming: Backport for Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664), Bump wikimedia/parsoid to 0.21.0-a9 (T380664) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:07 bking@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1027.eqiad.wmnet with reason: host reimage
22:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T370903)', diff saved to https://phabricator.wikimedia.org/P71260 and previous config saved to /var/cache/conftool/dbconfig/20241127-220638-ladsgroup.json
21:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T370903)', diff saved to https://phabricator.wikimedia.org/P71259 and previous config saved to /var/cache/conftool/dbconfig/20241127-215407-ladsgroup.json
21:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1226.eqiad.wmnet with reason: Maintenance
21:53 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1226.eqiad.wmnet with reason: Maintenance
21:45 cjming@deploy2002: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.21.0-a9 (T373035 T380664), Bump wikimedia/parsoid to 0.21.0-a9 (T380664)
21:43 cjming@deploy2002: Finished scap sync-world: Backport for Revert "Normalize ref html before comparison" (T380977) (duration: 12m 49s)
21:40 bking@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance
21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1216.eqiad.wmnet with reason: Maintenance
21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T370903)', diff saved to https://phabricator.wikimedia.org/P71258 and previous config saved to /var/cache/conftool/dbconfig/20241127-213759-ladsgroup.json
21:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1026.eqiad.wmnet with OS bullseye
21:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
21:37 cjming@deploy2002: cjming, cscott: Continuing with sync
21:37 cjming@deploy2002: cjming, cscott: Backport for Revert "Normalize ref html before comparison" (T380977) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:31 cjming@deploy2002: Started scap sync-world: Backport for Revert "Normalize ref html before comparison" (T380977)
21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P71257 and previous config saved to /var/cache/conftool/dbconfig/20241127-212252-ladsgroup.json
21:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029 (T376905)', diff saved to https://phabricator.wikimedia.org/P71256 and previous config saved to /var/cache/conftool/dbconfig/20241127-211704-ladsgroup.json
21:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P71255 and previous config saved to /var/cache/conftool/dbconfig/20241127-210745-ladsgroup.json
21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P71254 and previous config saved to /var/cache/conftool/dbconfig/20241127-210157-ladsgroup.json
20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T370903)', diff saved to https://phabricator.wikimedia.org/P71253 and previous config saved to /var/cache/conftool/dbconfig/20241127-205238-ladsgroup.json
20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029', diff saved to https://phabricator.wikimedia.org/P71252 and previous config saved to /var/cache/conftool/dbconfig/20241127-204650-ladsgroup.json
20:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Optimize (T379813)
20:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Optimize (T379813)
20:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2237 depool (T379813)', diff saved to https://phabricator.wikimedia.org/P71251 and previous config saved to /var/cache/conftool/dbconfig/20241127-204450-ladsgroup.json
20:38 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - bking@cumin2002"
20:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T370903)', diff saved to https://phabricator.wikimedia.org/P71250 and previous config saved to /var/cache/conftool/dbconfig/20241127-203724-ladsgroup.json
20:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance
20:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance
20:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T370903)', diff saved to https://phabricator.wikimedia.org/P71249 and previous config saved to /var/cache/conftool/dbconfig/20241127-203650-ladsgroup.json
20:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2029 (T376905)', diff saved to https://phabricator.wikimedia.org/P71248 and previous config saved to /var/cache/conftool/dbconfig/20241127-203143-ladsgroup.json
20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2029 (T376905)', diff saved to https://phabricator.wikimedia.org/P71247 and previous config saved to /var/cache/conftool/dbconfig/20241127-202446-ladsgroup.json
20:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
20:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2029.codfw.wmnet with reason: Maintenance
20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034 (T376905)', diff saved to https://phabricator.wikimedia.org/P71246 and previous config saved to /var/cache/conftool/dbconfig/20241127-202420-ladsgroup.json
20:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P71245 and previous config saved to /var/cache/conftool/dbconfig/20241127-202143-ladsgroup.json
20:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1026.eqiad.wmnet with reason: host reimage
20:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1026.eqiad.wmnet with reason: host reimage
20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P71244 and previous config saved to /var/cache/conftool/dbconfig/20241127-200913-ladsgroup.json
20:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P71243 and previous config saved to /var/cache/conftool/dbconfig/20241127-200636-ladsgroup.json
19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034', diff saved to https://phabricator.wikimedia.org/P71242 and previous config saved to /var/cache/conftool/dbconfig/20241127-195406-ladsgroup.json
19:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T370903)', diff saved to https://phabricator.wikimedia.org/P71241 and previous config saved to /var/cache/conftool/dbconfig/20241127-195129-ladsgroup.json
19:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye
19:50 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1025.eqiad.wmnet with OS bullseye
19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es2034 (T376905)', diff saved to https://phabricator.wikimedia.org/P71240 and previous config saved to /var/cache/conftool/dbconfig/20241127-193858-ladsgroup.json
19:36 moritzm: imported jenkins 2.479.2 to thirdparty/ci for bullseye-wikimedia
19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T370903)', diff saved to https://phabricator.wikimedia.org/P71239 and previous config saved to /var/cache/conftool/dbconfig/20241127-193529-ladsgroup.json
19:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance
19:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance
19:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T370903)', diff saved to https://phabricator.wikimedia.org/P71238 and previous config saved to /var/cache/conftool/dbconfig/20241127-193507-ladsgroup.json
19:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
19:32 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wdqs1025.eqiad.wmnet with OS bullseye
19:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es2034 (T376905)', diff saved to https://phabricator.wikimedia.org/P71237 and previous config saved to /var/cache/conftool/dbconfig/20241127-193202-ladsgroup.json
19:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
19:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2034.codfw.wmnet with reason: Maintenance
19:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye
19:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
19:23 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1025.eqiad.wmnet with OS bullseye
19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P71236 and previous config saved to /var/cache/conftool/dbconfig/20241127-192000-ladsgroup.json
19:18 brett@cumin2002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru [reason: repool magru, T376737]
19:18 brett@cumin2002: START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: repool magru, T376737]
19:17 mforns@deploy2002: Finished deploy [airflow-dags/analytics@99032bf]: regular weekly train (duration: 03m 10s)
19:14 mforns@deploy2002: Started deploy [airflow-dags/analytics@99032bf]: regular weekly train
19:13 mutante: disabled puppet on R:scap::target (180 hosts) for a short time - deploying gerrit:1092841
19:09 brett@puppetserver1001: conftool action : set/pooled=yes; selector: dc=magru,service=cdn
19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P71235 and previous config saved to /var/cache/conftool/dbconfig/20241127-190453-ladsgroup.json
19:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
18:56 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wdqs1025.eqiad.wmnet with OS bullseye
18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T370903)', diff saved to https://phabricator.wikimedia.org/P71233 and previous config saved to /var/cache/conftool/dbconfig/20241127-184946-ladsgroup.json
18:47 fabfur@cumin1002: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=magru
18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 16 hosts
18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for 16 hosts
18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7003.magru.wmnet
18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7003.magru.wmnet
18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7002.magru.wmnet
18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7002.magru.wmnet
18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs7001.magru.wmnet
18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs7001.magru.wmnet
18:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7002.wikimedia.org
18:38 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7002.wikimedia.org
18:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for dns7001.wikimedia.org
18:37 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for dns7001.wikimedia.org
18:37 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7001.wikimedia.org with reason: T380307
18:37 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7001.wikimedia.org with reason: T380307
18:36 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T370903)', diff saved to https://phabricator.wikimedia.org/P71232 and previous config saved to /var/cache/conftool/dbconfig/20241127-183455-ladsgroup.json
18:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance
18:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance
18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T370903)', diff saved to https://phabricator.wikimedia.org/P71231 and previous config saved to /var/cache/conftool/dbconfig/20241127-183432-ladsgroup.json
18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P71230 and previous config saved to /var/cache/conftool/dbconfig/20241127-181925-ladsgroup.json
18:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P71229 and previous config saved to /var/cache/conftool/dbconfig/20241127-180418-ladsgroup.json
17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T370903)', diff saved to https://phabricator.wikimedia.org/P71228 and previous config saved to /var/cache/conftool/dbconfig/20241127-174911-ladsgroup.json
17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T370903)', diff saved to https://phabricator.wikimedia.org/P71227 and previous config saved to /var/cache/conftool/dbconfig/20241127-173426-ladsgroup.json
17:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance
17:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance
17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T370903)', diff saved to https://phabricator.wikimedia.org/P71226 and previous config saved to /var/cache/conftool/dbconfig/20241127-173403-ladsgroup.json
17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:33 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
17:32 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
17:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
17:31 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
17:31 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
17:31 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
17:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
17:27 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
17:27 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
17:27 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
17:25 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
17:24 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
17:24 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
17:23 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
17:20 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
17:19 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
17:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P71225 and previous config saved to /var/cache/conftool/dbconfig/20241127-171857-ladsgroup.json
17:17 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7010.magru.wmnet with OS bullseye
17:16 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
17:16 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
17:14 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kafka-main1007.eqiad.wmnet
17:14 jiji@cumin1002: START - Cookbook sre.hosts.remove-downtime for kafka-main1007.eqiad.wmnet
17:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P71224 and previous config saved to /var/cache/conftool/dbconfig/20241127-170350-ladsgroup.json
16:56 jiji@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:55 jiji@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
16:55 jiji@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:55 jiji@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:55 jiji@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
16:54 jiji@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
16:54 jiji@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
16:54 jiji@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
16:54 jiji@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
16:53 jiji@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
16:53 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
16:53 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
16:53 jiji@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
16:52 jiji@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:52 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
16:52 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
16:52 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:52 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:51 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
16:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T370903)', diff saved to https://phabricator.wikimedia.org/P71222 and previous config saved to /var/cache/conftool/dbconfig/20241127-164843-ladsgroup.json
16:47 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T370903)', diff saved to https://phabricator.wikimedia.org/P71221 and previous config saved to /var/cache/conftool/dbconfig/20241127-163407-ladsgroup.json
16:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance
16:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance
16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T370903)', diff saved to https://phabricator.wikimedia.org/P71220 and previous config saved to /var/cache/conftool/dbconfig/20241127-163344-ladsgroup.json
16:27 jiji@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad
16:26 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7010.magru.wmnet with OS bullseye
16:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P71218 and previous config saved to /var/cache/conftool/dbconfig/20241127-161837-ladsgroup.json
16:16 jiji@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad
16:12 effie: roll restarting kafka-main brokers - T363214
16:11 moritzm: installing distro-info-data updates from bookworm point release
16:11 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:11 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp70101 - fabfur@cumin1002"
16:11 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp70101 - fabfur@cumin1002"
16:05 fabfur@cumin1002: START - Cookbook sre.dns.netbox
16:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P71217 and previous config saved to /var/cache/conftool/dbconfig/20241127-160330-ladsgroup.json
15:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T370903)', diff saved to https://phabricator.wikimedia.org/P71216 and previous config saved to /var/cache/conftool/dbconfig/20241127-154823-ladsgroup.json
15:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye
15:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye
15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T370903)', diff saved to https://phabricator.wikimedia.org/P71215 and previous config saved to /var/cache/conftool/dbconfig/20241127-153316-ladsgroup.json
15:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance
15:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance
15:32 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:31 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:30 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:30 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:28 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:27 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:22 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:22 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:21 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:20 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:09 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:08 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:08 Krinkle: krinkle@webperf2003: `sudo apt-get install kafkacat` (matching webperf1003, for ad-hoc debugging)
15:05 kart_: Updated recommendation-api to 2024-11-27-142924-production (T380838, T379036, T380699)
15:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1003.eqiad.wmnet to plain
15:03 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1003.eqiad.wmnet to plain
15:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
15:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
15:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1003.eqiad.wmnet to drbd
14:59 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
14:58 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
14:51 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1003.eqiad.wmnet to drbd
14:48 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
14:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
14:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
14:35 moritzm: rebalance magru01 following switch of VMs back to DRBD T376737
14:33 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on doh[7001-7002].wikimedia.org with reason: site is depooled, maintenance
14:33 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on doh[7001-7002].wikimedia.org with reason: site is depooled, maintenance
14:33 urbanecm@deploy2002: Finished scap sync-world: Backport for [GrowthExperiments] Undefine wgGEDatabaseCluster (T354939) (duration: 12m 21s)
14:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to drbd
14:26 urbanecm@deploy2002: urbanecm: Continuing with sync
14:26 urbanecm@deploy2002: urbanecm: Backport for [GrowthExperiments] Undefine wgGEDatabaseCluster (T354939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:25 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on cloudvirt1061.eqiad.wmnet with reason: cloudvirt1061 needs maintenance T380673
14:25 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on cloudvirt1061.eqiad.wmnet with reason: cloudvirt1061 needs maintenance T380673
14:24 urbanecm: Purge https://en.wikipedia.org/static/images/mobile/copyright/wikiquote-wordmark-az.svg (T380974)
14:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
14:21 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye
14:21 urbanecm@deploy2002: Started scap sync-world: Backport for [GrowthExperiments] Undefine wgGEDatabaseCluster (T354939)
14:20 urbanecm@deploy2002: Finished scap sync-world: Backport for Enable ParserMigration compact indicator on all wikis (T363484), Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401), Updated wordmark for Azerbaijani Wikiquote (T380974) (duration: 17m 20s)
14:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to drbd
14:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to drbd
14:13 urbanecm@deploy2002: urbanecm, cscott, nmw03: Continuing with sync
14:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to drbd
14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to drbd
14:08 urbanecm@deploy2002: urbanecm, cscott, nmw03: Backport for Enable ParserMigration compact indicator on all wikis (T363484), Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401), Updated wordmark for Azerbaijani Wikiquote (T380974) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:03 urbanecm@deploy2002: Started scap sync-world: Backport for Enable ParserMigration compact indicator on all wikis (T363484), Deploy Parsoid Read Views to de/ru wikivoyage and dagwiki (T375394 T380401), Updated wordmark for Azerbaijani Wikiquote (T380974)
13:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to drbd
13:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to drbd
13:45 moritzm: installing php8.2 security updates
13:40 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to drbd
13:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to drbd
13:38 mszabo@deploy2002: Finished scap sync-world: Backport for private: Add stub for wgReportIncidentZendeskSubjectLine (T380868), Configure IRS Zendesk integration (T380908), Configure instrument for the Incident Reporting System (T372823) (duration: 13m 53s)
13:31 mszabo@deploy2002: mszabo: Continuing with sync
13:30 mszabo@deploy2002: mszabo: Backport for private: Add stub for wgReportIncidentZendeskSubjectLine (T380868), Configure IRS Zendesk integration (T380908), Configure instrument for the Incident Reporting System (T372823) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:28 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to drbd
13:27 moritzm: rebalance magru02 following switch of VMs back to DRBD T376737
13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to drbd
13:24 mszabo@deploy2002: Started scap sync-world: Backport for private: Add stub for wgReportIncidentZendeskSubjectLine (T380868), Configure IRS Zendesk integration (T380908), Configure instrument for the Incident Reporting System (T372823)
13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye
13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
13:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
13:16 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to drbd
13:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
13:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to drbd
13:05 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to drbd
13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to drbd
12:56 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main[1002,1007].eqiad.wmnet with reason: Hardware refresh
12:56 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main[1002,1007].eqiad.wmnet with reason: Hardware refresh
12:50 moritzm: installing ghostscript security updates
12:39 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:38 effie: start replacing kafka-main1002 with kafka-main1007 - T363214
12:24 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
12:24 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
12:24 kart_: Updated cxserver to 2024-11-20-121713-production (T377966, T357950)
12:22 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
12:22 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
12:20 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
12:20 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
12:18 moritzm: installing python-cryptography security updates
12:14 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
12:13 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
12:12 moritzm: installing openssl security updates
12:08 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
12:07 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
12:06 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
12:06 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to drbd
12:06 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
12:05 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
12:05 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
12:05 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
12:05 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ganeti2042.codfw.wmnet with reason: broken CPU
12:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on ganeti2042.codfw.wmnet with reason: broken CPU
11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to drbd
11:45 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump ratio of new parsercache key spec to 3 (T373037) (duration: 12m 51s)
11:38 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:38 ladsgroup@deploy2002: ladsgroup: Backport for Bump ratio of new parsercache key spec to 3 (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:34 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to drbd
11:32 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump ratio of new parsercache key spec to 3 (T373037)
11:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to drbd
11:21 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7002.wikimedia.org with reason: T376737
11:21 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7002.wikimedia.org with reason: T376737
11:21 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dns7001.wikimedia.org with reason: T376737
11:20 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dns7001.wikimedia.org with reason: T376737
11:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on lvs[7001-7003].magru.wmnet with reason: T376737
11:19 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on lvs[7001-7003].magru.wmnet with reason: T376737
11:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 16 hosts with reason: T376737
11:19 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 16 hosts with reason: T376737
11:18 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to drbd
11:16 xSavitar: T380875 Ran mwscript-k8s --comment="T380875" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki 'EMBakeryEquipment' 'Janapanna'
11:15 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7002.magru.wmnet to cluster magru02 and group B4
11:13 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7002.magru.wmnet to cluster magru02 and group B4
11:13 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs7001.magru.wmnet with reason: T376737
11:13 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs7001.magru.wmnet with reason: T376737
11:04 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
11:03 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7001.magru.wmnet to cluster magru01 and group B3
11:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7001.magru.wmnet to cluster magru01 and group B3
10:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
10:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7008.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
10:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
10:01 fabfur@cumin1002: START - Cookbook sre.hosts.provision for host cp7008.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
10:00 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7002
09:59 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7001
09:58 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7001
09:55 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7006.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
09:48 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "readded ganeti nodes in magru - jmm@cumin2002 - T376737"
09:48 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "readded ganeti nodes in magru - jmm@cumin2002 - T376737"
09:46 fabfur@cumin1002: START - Cookbook sre.hosts.provision for host cp7006.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
09:45 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.5 refs T375664
09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7001.magru.wmnet with OS bookworm
09:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
09:30 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
09:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
09:06 kartik@deploy2002: Finished scap sync-world: Backport for ext.uls.inputsettings: Use arrow functions (T380431) (duration: 16m 06s)
09:05 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7001.magru.wmnet with reason: host reimage
08:59 kartik@deploy2002: abi, kartik: Continuing with sync
08:55 kartik@deploy2002: abi, kartik: Backport for ext.uls.inputsettings: Use arrow functions (T380431) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:50 kartik@deploy2002: Started scap sync-world: Backport for ext.uls.inputsettings: Use arrow functions (T380431)
08:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7001.magru.wmnet with OS bookworm
08:38 kartik@deploy2002: Finished scap sync-world: Backport for Fix illegal access of typed property. (T380724) (duration: 21m 02s)
08:31 kartik@deploy2002: kartik, abi: Continuing with sync
08:24 kartik@deploy2002: kartik, abi: Backport for Fix illegal access of typed property. (T380724) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7002.magru.wmnet with OS bookworm
08:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
08:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jmm@cumin2002"
08:17 kartik@deploy2002: Started scap sync-world: Backport for Fix illegal access of typed property. (T380724)
08:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage
07:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7002.magru.wmnet with reason: host reimage
07:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7002.magru.wmnet with OS bookworm
07:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.

2024-11-26

23:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7002.wikimedia.org with OS bookworm
23:29 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
23:28 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
23:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs7001.magru.wmnet with OS bullseye
23:28 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
23:23 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
23:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7010.magru.wmnet with OS bullseye
23:13 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
23:12 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
23:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7001.magru.wmnet with reason: host reimage
23:00 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7001.magru.wmnet with reason: host reimage
22:54 reedy@deploy2002: Finished scap sync-world: Backport for Add CodeMirror to BetaFeaturesAllowList (T376735) (duration: 31m 35s)
22:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7002.wikimedia.org with reason: host reimage
22:48 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7002.wikimedia.org with reason: host reimage
22:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
22:45 reedy@deploy2002: musikanimal, reedy: Continuing with sync
22:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7010.magru.wmnet with reason: host reimage
22:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7001.magru.wmnet with OS bullseye
22:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7002.magru.wmnet with OS bullseye
22:37 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
22:32 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
22:28 reedy@deploy2002: musikanimal, reedy: Backport for Add CodeMirror to BetaFeaturesAllowList (T376735) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:26 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 4800
22:25 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7010.magru.wmnet with OS bullseye
22:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm
22:24 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns7002.wikimedia.org with OS bullseye
22:24 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 4800
22:22 reedy@deploy2002: Started scap sync-world: Backport for Add CodeMirror to BetaFeaturesAllowList (T376735)
22:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7002.wikimedia.org with reason: host reimage
22:21 reedy@deploy2002: Finished scap sync-world: Backport for Nov 26 2024: Vector 2022 Deployments (T379799) (duration: 19m 52s)
22:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7002.wikimedia.org with reason: host reimage
22:15 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262979
22:14 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262979
22:11 reedy@deploy2002: jdlrobson, reedy: Continuing with sync
22:08 reedy@deploy2002: jdlrobson, reedy: Backport for Nov 26 2024: Vector 2022 Deployments (T379799) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7002.magru.wmnet with reason: host reimage
22:01 reedy@deploy2002: Started scap sync-world: Backport for Nov 26 2024: Vector 2022 Deployments (T379799)
22:00 reedy@deploy2002: Finished scap sync-world: Backport for Add BetaFeature for CodeMirror 6 (T376735) (duration: 40m 05s)
21:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7004.magru.wmnet with OS bullseye
21:58 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
21:58 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7002.magru.wmnet with reason: host reimage
21:57 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
21:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bullseye
21:46 reedy@deploy2002: musikanimal, reedy: Continuing with sync
21:44 reedy@deploy2002: musikanimal, reedy: Backport for Add BetaFeature for CodeMirror 6 (T376735) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:38 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye
21:35 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7002.magru.wmnet with OS bullseye
21:35 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7002.wikimedia.org with OS bookworm
21:35 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cp7002.magru.wmnet dns7002.magru.wmnet on all recursors
21:35 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7002.magru.wmnet dns7002.magru.wmnet on all recursors
21:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7010.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7010.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:32 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:32 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
21:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7004.magru.wmnet with reason: host reimage
21:32 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
21:30 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7001
21:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7003.magru.wmnet with OS bullseye
21:30 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
21:30 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7001
21:30 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7010
21:30 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7010
21:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7004.magru.wmnet with reason: host reimage
21:28 robh@cumin2002: START - Cookbook sre.dns.netbox
21:28 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:26 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
21:25 robh@cumin2002: START - Cookbook sre.dns.netbox
21:24 damilare: civicrm upgraded from 59d340cd to 3b1ed162
21:23 damilare: SmashPig upgraded from 131e92a5 to 79b463b4
21:22 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lvs7001.magru.wmnet
21:22 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:21 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns7002.wikimedia.org with OS bookworm
21:20 reedy@deploy2002: Started scap sync-world: Backport for Add BetaFeature for CodeMirror 6 (T376735)
21:20 robh@cumin2002: START - Cookbook sre.dns.netbox
21:20 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7010.magru.wmnet
21:20 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:20 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7010.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
21:19 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7010.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
21:17 reedy@deploy2002: Synchronized wmf-config/core-Permissions.php: T380753 (duration: 11m 23s)
21:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7002.magru.wmnet with OS bullseye
21:15 robh@cumin2002: START - Cookbook sre.dns.netbox
21:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye
21:08 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7004.magru.wmnet with OS bullseye
21:08 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7010.magru.wmnet
21:08 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs7001.magru.wmnet
21:04 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:04 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:02 brett@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cp7003.magru.wmnet cp7004.magru.wmnet on all recursors
21:02 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7003.magru.wmnet cp7004.magru.wmnet on all recursors
21:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:01 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:01 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
21:01 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns7002
21:01 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
21:01 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns7002
21:01 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7002
21:01 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7002
20:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
20:54 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7003.magru.wmnet with reason: host reimage
20:54 robh@cumin2002: START - Cookbook sre.dns.netbox
20:50 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7002.magru.wmnet
20:50 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:47 robh@cumin2002: START - Cookbook sre.dns.netbox
20:47 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dns7002.wikimedia.org
20:47 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:47 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns7002.wikimedia.org decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
20:47 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns7002.wikimedia.org decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
20:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7004.magru.wmnet with OS bullseye
20:43 robh@cumin2002: START - Cookbook sre.dns.netbox
20:39 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release 20241126
20:37 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7002.magru.wmnet
20:37 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns7002.wikimedia.org
20:35 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7003.magru.wmnet with OS bullseye
20:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7004.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye
20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye
20:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1025.eqiad.wmnet with OS bullseye
20:32 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:31 swfrench@deploy2002: Finished scap sync-world: Backport for debug.json: add support for mwdebug-next (T372605) (duration: 14m 21s)
20:26 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7004.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:26 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:26 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
20:26 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
20:25 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:25 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7002
20:25 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002
20:24 swfrench@deploy2002: swfrench: Continuing with sync
20:23 robh@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti7002
20:23 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7002
20:23 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:23 swfrench@deploy2002: swfrench: Backport for debug.json: add support for mwdebug-next (T372605) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:22 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7002.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:21 robh@cumin2002: START - Cookbook sre.dns.netbox
20:21 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release 20241126
20:17 swfrench@deploy2002: Started scap sync-world: Backport for debug.json: add support for mwdebug-next (T372605)
20:16 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7004.magru.wmnet
20:16 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:16 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Update
20:14 robh@cumin2002: START - Cookbook sre.dns.netbox
20:13 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti7002.magru.wmnet
20:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:13 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7002.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
20:13 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7002.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
20:11 hashar@deploy2002: Finished scap sync-world: Backport for Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862) (duration: 15m 23s)
20:09 robh@cumin2002: START - Cookbook sre.dns.netbox
20:08 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update
20:07 aokoth@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Update
20:07 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update
20:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7004.magru.wmnet
20:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7002.magru.wmnet
20:02 hashar@deploy2002: hashar: Continuing with sync
20:02 hashar@deploy2002: hashar: Backport for Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:00 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7003.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:59 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:55 hashar@deploy2002: Started scap sync-world: Backport for Avoid exception on mTemplateIds/mTemplate array discrepancy (T380862)
19:52 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7003.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:52 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:51 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:51 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
19:50 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru shuffle - robh@cumin2002"
19:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7001
19:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7001
19:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7003
19:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7003
19:46 robh@cumin2002: START - Cookbook sre.dns.netbox
19:43 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7003.magru.wmnet
19:43 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:43 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
19:42 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
19:33 robh@cumin2002: START - Cookbook sre.dns.netbox
19:27 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7003.magru.wmnet
19:27 urbanecm: [urbanecm@mwmaint2002 ~]$ foreachwiki userOptions.php --delete-defaults growthexperiments-homepage-variant # T379146, logging to /home/urbanecm/T379146.log
19:26 urbanecm: mwscript-k8s -f userOptions.php -- --wiki=enwiki --old=oldimpact --delete 'growthexperiments-homepage-variant' # T379146
19:23 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti7001.magru.wmnet
19:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:23 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
19:23 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7001.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
19:22 eileen: civicrm upgraded from eec961a3 to 59d340cd
19:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P71191 and previous config saved to /var/cache/conftool/dbconfig/20241126-192112-ladsgroup.json
19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye
19:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
19:11 robh@cumin2002: START - Cookbook sre.dns.netbox
19:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P71190 and previous config saved to /var/cache/conftool/dbconfig/20241126-190607-ladsgroup.json
18:55 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7001.magru.wmnet
18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P71189 and previous config saved to /var/cache/conftool/dbconfig/20241126-185101-ladsgroup.json
18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P71188 and previous config saved to /var/cache/conftool/dbconfig/20241126-183556-ladsgroup.json
18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2215 repool', diff saved to https://phabricator.wikimedia.org/P71187 and previous config saved to /var/cache/conftool/dbconfig/20241126-183547-ladsgroup.json
18:34 ladsgroup@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db2215 gradually with 4 steps - Maint over
18:33 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2215 gradually with 4 steps - Maint over
18:25 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{wikikube-ctrl200[1-3].codfw.wmnet} and (A:wikikube-worker-codfw or A:wikikube-master-codfw)
18:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance
18:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance
17:58 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{wikikube-ctrl200[1-3].codfw.wmnet} and (A:wikikube-worker-codfw or A:wikikube-master-codfw)
17:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1313-1327].eqiad.wmnet
17:47 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1313-1327].eqiad.wmnet
17:35 claime: homer 'cr*eqiad*' commit 'T380350'
17:35 claime: homer 'lsw1-e7-eqiad*' commit 'T380350'
17:34 claime: homer 'lsw1-f6-eqiad*' commit 'T380350'
17:34 claime: homer 'lsw1-f5-eqiad*' commit 'T380350'
17:33 claime: homer 'lsw1-e5-eqiad*' commit 'T380350'
17:32 claime: homer 'lsw1-e6-eqiad*' commit 'T380350'
17:31 claime: homer 'lsw1-f7-eqiad*' commit 'T380350'
17:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1321.eqiad.wmnet with OS bookworm
17:25 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{wikikube-ctrl100[1-3].eqiad.wmnet} and (A:wikikube-worker-eqiad or A:wikikube-master-eqiad)
17:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1325.eqiad.wmnet with OS bookworm
17:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1326.eqiad.wmnet with OS bookworm
17:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1324.eqiad.wmnet with OS bookworm
17:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1327.eqiad.wmnet with OS bookworm
17:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1322.eqiad.wmnet with OS bookworm
17:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage
17:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage
17:05 ladsgroup@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
17:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage
17:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1323.eqiad.wmnet with OS bookworm
17:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage
16:59 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{wikikube-ctrl100[1-3].eqiad.wmnet} and (A:wikikube-worker-eqiad or A:wikikube-master-eqiad)
16:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage
16:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage
16:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage
16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage
16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage
16:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage
16:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage
16:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage
16:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm
16:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage
16:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:42 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1321.eqiad.wmnet with OS bookworm
16:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:41 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage
16:40 urbanecm: `mwscript-k8s -f userOptions.php -- --wiki=enwiki --old=control --delete 'growthexperiments-homepage-variant'` # T379146, T377631
16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm
16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm
16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm
16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm
16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm
16:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm
16:28 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1321.eqiad.wmnet with OS bookworm
16:28 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1327.eqiad.wmnet with OS bookworm
16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1326.eqiad.wmnet with OS bookworm
16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1325.eqiad.wmnet with OS bookworm
16:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1324.eqiad.wmnet with OS bookworm
16:26 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1322.eqiad.wmnet with OS bookworm
16:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm
16:20 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1323.eqiad.wmnet with OS bookworm
15:52 moritzm: installing intel-microcode security updates
15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm
15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm
15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm
15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm
15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm
15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm
15:42 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7001.wikimedia.org with OS bookworm
15:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2042.codfw.wmnet
15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1027.eqiad.wmnet with OS bullseye
15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1025.eqiad.wmnet with OS bullseye
15:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1026.eqiad.wmnet with OS bullseye
15:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2042.codfw.wmnet
15:34 moritzm: installing wireshark security updates
15:33 ladsgroup@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2215 gradually with 4 steps - Maint over
15:33 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2215 gradually with 4 steps - Maint over
15:27 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
15:25 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
15:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
15:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
15:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
15:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
15:16 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
15:16 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7001.wikimedia.org with reason: host reimage
away: UTC afternoon deploys done
15:08 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
15:08 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
15:07 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7001.wikimedia.org with reason: host reimage
15:05 tgr@deploy2002: Finished scap sync-world: Backport for Allow simulating the SUL3 shared domain settings via env var (T380575) (duration: 26m 23s)
14:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance
14:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance
14:58 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance
14:58 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh[7001-7002].wikimedia.org,durum[7001-7002].magru.wmnet with reason: site is depooled, maintenance
14:56 tgr@deploy2002: tgr: Continuing with sync
14:44 tgr@deploy2002: tgr: Backport for Allow simulating the SUL3 shared domain settings via env var (T380575) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:43 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm
14:43 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns7001.wikimedia.org with OS bullseye
14:43 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
14:40 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
14:39 tgr@deploy2002: Started scap sync-world: Backport for Allow simulating the SUL3 shared domain settings via env var (T380575)
14:31 mlitn@deploy2002: Finished scap sync-world: Backport for Fix incorrect 'this' (duration: 12m 36s)
14:25 mlitn@deploy2002: mlitn: Continuing with sync
14:25 mlitn@deploy2002: mlitn: Backport for Fix incorrect 'this' synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1027.eqiad.wmnet with OS bullseye
14:19 mlitn@deploy2002: Started scap sync-world: Backport for Fix incorrect 'this'
14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1025.eqiad.wmnet with OS bullseye
14:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wdqs1026.eqiad.wmnet with OS bullseye
14:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
14:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
14:14 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add grid view - oblivian@cumin1002"
14:14 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add grid view - oblivian@cumin1002
14:14 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add grid view - oblivian@cumin1002
14:13 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add grid view - oblivian@cumin1002"
14:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
14:09 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns7001.wikimedia.org with reason: host reimage
14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
14:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
14:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns7001.wikimedia.org with reason: host reimage
14:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
14:01 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7015.magru.wmnet with OS bullseye
14:01 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
14:01 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
13:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Reclone (T379724)
13:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Reclone (T379724)
13:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Reclone (T379724)
13:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Reclone (T379724)
13:49 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs7003.magru.wmnet with OS bullseye
13:49 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
13:49 ladsgroup@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
13:46 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
13:43 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bullseye
13:40 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host dns7001.wikimedia.org
13:38 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7004.magru.wmnet with reason: T376737
13:38 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7004.magru.wmnet with reason: T376737
13:35 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7015.magru.wmnet with reason: host reimage
13:34 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7010.magru.wmnet with reason: T376737
13:34 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7010.magru.wmnet with reason: T376737
13:34 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: T376737
13:34 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: T376737
13:32 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7015.magru.wmnet with reason: host reimage
13:30 Emperor: swift delete wikipedia-commons-local-public.bf b/bf/Schuur_-_Nieuwerbrug_-_20164513_-_RCE.jpg T380738
13:29 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: T376737
13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7003.magru.wmnet with reason: T376737
13:28 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7002.magru.wmnet with reason: T376737
13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7002.magru.wmnet with reason: T376737
13:28 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7002.magru.wmnet with reason: T376737
13:28 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7002.magru.wmnet with reason: T376737
13:27 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7008.magru.wmnet with reason: T376737
13:27 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7008.magru.wmnet with reason: T376737
13:27 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7006.magru.wmnet with reason: T376737
13:26 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7006.magru.wmnet with reason: T376737
13:26 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp7001.magru.wmnet with reason: T376737
13:26 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp7001.magru.wmnet with reason: T376737
13:21 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host dns7001.wikimedia.org
13:20 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage
13:18 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage
13:15 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
13:11 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
13:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71185 and previous config saved to /var/cache/conftool/dbconfig/20241126-131120-arnaudb.json
13:07 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns7001.wikimedia.org with OS bookworm
13:03 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm
12:58 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye
12:57 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71183 and previous config saved to /var/cache/conftool/dbconfig/20241126-125614-arnaudb.json
12:53 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_esams and A:cp for 9.2.6-1wm2
12:53 dcaro@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
12:51 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_esams and A:cp for 9.2.6-1wm2
12:48 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye
12:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71182 and previous config saved to /var/cache/conftool/dbconfig/20241126-124109-arnaudb.json
12:30 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
12:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71181 and previous config saved to /var/cache/conftool/dbconfig/20241126-122622-arnaudb.json
12:26 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.5 refs T375664
12:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71180 and previous config saved to /var/cache/conftool/dbconfig/20241126-122603-arnaudb.json
12:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:20 robh@cumin2002: START - Cookbook sre.dns.netbox
12:20 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003
12:20 moritzm: failover Ganeti master in magru02 to ganeti7004
12:20 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003
12:19 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7015
12:19 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7015
12:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to plain
12:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to plain
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to plain
12:11 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to plain
12:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71179 and previous config saved to /var/cache/conftool/dbconfig/20241126-121117-arnaudb.json
12:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 20%: repool', diff saved to https://phabricator.wikimedia.org/P71178 and previous config saved to /var/cache/conftool/dbconfig/20241126-121058-arnaudb.json
12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to plain
12:10 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump ratio of new parsercache key spec to 4 (T373037) (duration: 15m 21s)
12:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to plain
12:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to plain
12:07 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to plain
12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to plain
12:05 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to plain
12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to drbd
12:02 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:01 ladsgroup@deploy2002: ladsgroup: Backport for Bump ratio of new parsercache key spec to 4 (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71177 and previous config saved to /var/cache/conftool/dbconfig/20241126-115612-arnaudb.json
11:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71176 and previous config saved to /var/cache/conftool/dbconfig/20241126-115552-arnaudb.json
11:55 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump ratio of new parsercache key spec to 4 (T373037)
11:54 hashar@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.5 refs T375664 (duration: 25m 52s)
11:53 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_esams and A:cp for 9.2.6-1wm2
11:53 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_esams and A:cp for 9.2.6-1wm2
11:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71175 and previous config saved to /var/cache/conftool/dbconfig/20241126-114106-arnaudb.json
11:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71174 and previous config saved to /var/cache/conftool/dbconfig/20241126-114047-arnaudb.json
11:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_eqiad and A:cp for 9.2.6-1wm2
11:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_eqiad and A:cp for 9.2.6-1wm2
11:31 moritzm: remove ganeti7001 from active Ganeti nodes in magru01
11:28 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
11:28 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs T375664
11:28 moritzm: failover Ganeti master in magru01 to ganeti7003
11:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain
11:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71173 and previous config saved to /var/cache/conftool/dbconfig/20241126-112601-arnaudb.json
11:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71172 and previous config saved to /var/cache/conftool/dbconfig/20241126-112542-arnaudb.json
11:25 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain
11:25 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs T375664
11:25 dcaro@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
11:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain
11:23 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain
11:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain
11:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain
11:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain
11:12 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain
11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71171 and previous config saved to /var/cache/conftool/dbconfig/20241126-111056-arnaudb.json
11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7003.magru.wmnet with reason: T376737
11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7003.magru.wmnet with reason: T376737
11:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 2%: repool', diff saved to https://phabricator.wikimedia.org/P71170 and previous config saved to /var/cache/conftool/dbconfig/20241126-111036-arnaudb.json
11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7002.magru.wmnet with reason: T376737
11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7002.magru.wmnet with reason: T376737
11:10 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on lvs7001.magru.wmnet with reason: T376737
11:10 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on lvs7001.magru.wmnet with reason: T376737
11:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain
11:09 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain
11:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to drbd
11:05 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
11:03 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to drbd
11:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to drbd
10:56 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to drbd
10:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71169 and previous config saved to /var/cache/conftool/dbconfig/20241126-105550-arnaudb.json
10:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: repool', diff saved to https://phabricator.wikimedia.org/P71168 and previous config saved to /var/cache/conftool/dbconfig/20241126-105531-arnaudb.json
10:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to drbd
10:47 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.5 refs T375664
10:46 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
10:43 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to drbd
10:42 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to drbd
10:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_eqiad and A:cp for 9.2.6-1wm2
10:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_eqiad and A:cp for 9.2.6-1wm2
10:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to drbd
10:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to drbd
10:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
10:28 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to drbd
10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to drbd
10:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to drbd
10:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to drbd
10:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to drbd
10:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to drbd
10:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to drbd
10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to drbd
10:02 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to drbd
09:57 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to drbd
09:56 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7004.magru.wmnet to cluster magru02 and group B4
09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7004.magru.wmnet to cluster magru02 and group B4
09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7004.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
09:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
09:23 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7004.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
09:21 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet
09:21 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:21 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin2002"
09:21 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jayme@cumin2002"
09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti7003.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
09:11 jayme@cumin2002: START - Cookbook sre.dns.netbox
09:11 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti7003.mgmt.magru.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
09:03 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
08:52 jayme@cumin2002: START - Cookbook sre.hosts.decommission for hosts kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet
08:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7004.magru.wmnet to cluster magru02 and group B4
08:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7004.magru.wmnet to cluster magru02 and group B4
08:49 dcausse@deploy2002: Finished deploy [airflow-dags/search@f969d75]: search: swift_upload.py moved to refinery/bin/ (duration: 00m 27s)
08:49 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1005-1006,1015-1016].eqiad.wmnet
08:48 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1005-1006,1015-1016].eqiad.wmnet
08:48 dcausse@deploy2002: Started deploy [airflow-dags/search@f969d75]: search: swift_upload.py moved to refinery/bin/
08:47 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[2005-2006,2015-2016].codfw.wmnet
08:46 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[2005-2006,2015-2016].codfw.wmnet
08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
08:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
08:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7003.magru.wmnet to cluster magru01 and group B3
08:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
08:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
08:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
08:06 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7004
08:06 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7004
08:06 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7003
08:05 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7003
07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'manual depool commit', diff saved to https://phabricator.wikimedia.org/P71164 and previous config saved to /var/cache/conftool/dbconfig/20241126-075433-arnaudb.json
07:55 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) db1233 - clone on db1246
07:54 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db1233 - clone on db1246
07:36 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f48b8de2] (duration: 00m 29s)
07:35 joal@deploy2002: Started deploy [analytics/refinery@f48b8de] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f48b8de2]
07:35 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de] (thin): Regular analytics weekly train THIN [analytics/refinery@f48b8de2] (duration: 00m 35s)
07:34 joal@deploy2002: Started deploy [analytics/refinery@f48b8de] (thin): Regular analytics weekly train THIN [analytics/refinery@f48b8de2]
07:34 joal@deploy2002: Finished deploy [analytics/refinery@f48b8de]: Regular analytics weekly train [analytics/refinery@f48b8de2] (duration: 02m 03s)
07:33 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "UI bugfixes - oblivian@cumin1002"
07:33 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: UI bugfixes - oblivian@cumin1002
07:33 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: UI bugfixes - oblivian@cumin1002
07:33 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "UI bugfixes - oblivian@cumin1002"
07:32 joal@deploy2002: Started deploy [analytics/refinery@f48b8de]: Regular analytics weekly train [analytics/refinery@f48b8de2]
03:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2215 (T380449)', diff saved to https://phabricator.wikimedia.org/P71163 and previous config saved to /var/cache/conftool/dbconfig/20241126-034040-ladsgroup.json
03:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance
03:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2215.codfw.wmnet with reason: Maintenance
03:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 20:00:00 on wdqs[2018-2020,2026-2027].codfw.wmnet with reason: T376150 non-prod hosts
03:12 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 20:00:00 on wdqs[2018-2020,2026-2027].codfw.wmnet with reason: T376150 non-prod hosts
03:11 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling neither afterwards
03:10 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling neither afterwards
03:09 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling neither afterwards
03:07 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling neither afterwards
02:42 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance
02:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance
02:34 brett: Import libvmod-netmapper 1.9.1-1 into varnish-staging apt component
02:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T376150
02:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T376150
02:29 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2020.codfw.wmnet, repooling source-only afterwards
02:24 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2020.codfw.wmnet, repooling source-only afterwards
01:47 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet, repooling source-only afterwards
01:29 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host dns7001.wikimedia.org with OS bookworm
01:08 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling source-only afterwards
01:06 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on lvs[7001-7003].magru.wmnet with reason: site is depooled, maintenance
01:06 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on lvs[7001-7003].magru.wmnet with reason: site is depooled, maintenance
01:04 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet, repooling source-only afterwards
01:04 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye
01:03 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2019.codfw.wmnet, repooling source-only afterwards
01:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T376150
01:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T376150
00:55 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
00:28 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
00:21 eileen: civicrm upgraded from 190ea417 to eec961a3
00:16 tzatziki: removing 6 files for legal compliance
00:00 tzatziki: removing 1 file for legal compliance

2024-11-25

23:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
23:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
23:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 (T380449)', diff saved to https://phabricator.wikimedia.org/P71162 and previous config saved to /var/cache/conftool/dbconfig/20241125-235547-ladsgroup.json
23:54 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2018.codfw.wmnet, repooling source-only afterwards
23:53 tzatziki: removing 1 file for legal compliance
23:49 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal main tier) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2018.codfw.wmnet, repooling source-only afterwards
23:44 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:42 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P71161 and previous config saved to /var/cache/conftool/dbconfig/20241125-234040-ladsgroup.json
23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191', diff saved to https://phabricator.wikimedia.org/P71160 and previous config saved to /var/cache/conftool/dbconfig/20241125-232533-ladsgroup.json
23:23 tzatziki: removing 2 files for legal compliance
23:16 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:16 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:14 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:14 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2191 (T380449)', diff saved to https://phabricator.wikimedia.org/P71159 and previous config saved to /var/cache/conftool/dbconfig/20241125-231026-ladsgroup.json
23:10 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:10 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:09 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:09 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
23:09 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye
23:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
23:01 bking@cumin1002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
23:01 bking@cumin1002: START - Cookbook sre.wdqs.data-transfer
23:01 brett@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) cp7015.magru.wmnet lvs7003.magru.wmnet cp7015.mgmt.magru.wmnet lvs7003.mgmt.magru.wmnet on all recursors
23:00 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7015.magru.wmnet lvs7003.magru.wmnet cp7015.mgmt.magru.wmnet lvs7003.mgmt.magru.wmnet on all recursors
23:00 brett@cumin2002: END (FAIL) - Cookbook sre.dns.wipe-cache (exit_code=99) cp7015.magru.wmnet lvs7003.magru.wmnet on all recursors
23:00 brett@cumin2002: START - Cookbook sre.dns.wipe-cache cp7015.magru.wmnet lvs7003.magru.wmnet on all recursors
22:56 brett: Import varnish-modules 0.20.0-2~deb11u1 into varnish-staging apt component
22:56 bking@cumin1002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:56 bking@cumin1002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:53 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:53 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer wikidata from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2191 (T380449)', diff saved to https://phabricator.wikimedia.org/P71158 and previous config saved to /var/cache/conftool/dbconfig/20241125-224949-ladsgroup.json
22:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2191.codfw.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2191.codfw.wmnet with reason: Maintenance
22:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T380449)', diff saved to https://phabricator.wikimedia.org/P71157 and previous config saved to /var/cache/conftool/dbconfig/20241125-224927-ladsgroup.json
22:48 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:48 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:46 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7015.magru.wmnet with OS bullseye
22:43 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:43 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:38 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:38 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:37 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:37 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
22:37 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T376150, initialize wdqs internal scholarly tier) xfer scholarly_articles from wdqs2024.codfw.wmnet -> wdqs2026.codfw.wmnet, repooling source-only afterwards
away: UTC late deploys done
22:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P71156 and previous config saved to /var/cache/conftool/dbconfig/20241125-223420-ladsgroup.json
22:34 tgr@deploy2002: Finished scap sync-world: Backport for SUL3: Sort overrides (T373737), More authentication domain overrides (T373737), Update private/readme.php to match production (duration: 12m 49s)
22:32 eileen: civicrm upgraded from b7bd670f to 190ea417
22:31 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs7003.magru.wmnet with OS bullseye
22:27 tgr@deploy2002: tgr: Continuing with sync
22:25 tgr@deploy2002: tgr: Backport for SUL3: Sort overrides (T373737), More authentication domain overrides (T373737), Update private/readme.php to match production synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:21 tgr@deploy2002: Started scap sync-world: Backport for SUL3: Sort overrides (T373737), More authentication domain overrides (T373737), Update private/readme.php to match production
22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P71155 and previous config saved to /var/cache/conftool/dbconfig/20241125-221913-ladsgroup.json
22:19 tgr@deploy2002: Finished scap sync-world: Backport for Reader Survey: Increase coverage (T378660) (duration: 14m 08s)
22:13 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage
22:12 tgr@deploy2002: tgr, dani: Continuing with sync
22:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs7003.magru.wmnet with reason: host reimage
22:09 tgr@deploy2002: tgr, dani: Backport for Reader Survey: Increase coverage (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:09 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
22:08 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye
22:04 tgr@deploy2002: Started scap sync-world: Backport for Reader Survey: Increase coverage (T378660)
22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T380449)', diff saved to https://phabricator.wikimedia.org/P71154 and previous config saved to /var/cache/conftool/dbconfig/20241125-220406-ladsgroup.json
22:02 tgr@deploy2002: Finished scap sync-world: Backport for LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042) (duration: 12m 41s)
21:56 tgr@deploy2002: tgr, matmarex: Continuing with sync
21:54 tgr@deploy2002: tgr, matmarex: Backport for LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:50 tgr@deploy2002: Started scap sync-world: Backport for LoginCompleteHookHandler: onTempUserCreatedRedirect() should use getPrimaryInstance() (T380042)
21:49 tgr@deploy2002: Finished scap sync-world: Backport for Reader Survey: Increase coverage on enwiki (T378660) (duration: 16m 06s)
21:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
21:45 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7015.magru.wmnet with OS bullseye
21:42 tgr@deploy2002: tgr, dani: Continuing with sync
21:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2131 (T380449)', diff saved to https://phabricator.wikimedia.org/P71153 and previous config saved to /var/cache/conftool/dbconfig/20241125-213904-ladsgroup.json
21:38 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
21:38 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2131.codfw.wmnet with reason: Maintenance
21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T380449)', diff saved to https://phabricator.wikimedia.org/P71152 and previous config saved to /var/cache/conftool/dbconfig/20241125-213841-ladsgroup.json
21:37 tgr@deploy2002: tgr, dani: Backport for Reader Survey: Increase coverage on enwiki (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:33 tgr@deploy2002: Started scap sync-world: Backport for Reader Survey: Increase coverage on enwiki (T378660)
21:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye
21:30 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs7003.magru.wmnet with OS bullseye
21:29 tgr@deploy2002: Finished scap sync-world: Backport for Reader Survey: Fix yes/no messages (T378660) (duration: 16m 02s)
21:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P71151 and previous config saved to /var/cache/conftool/dbconfig/20241125-212334-ladsgroup.json
21:22 tgr@deploy2002: dani, tgr: Continuing with sync
21:18 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testing - sukhe@cumin1002"
21:18 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testing - sukhe@cumin1002"
21:17 tgr@deploy2002: dani, tgr: Backport for Reader Survey: Fix yes/no messages (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:13 tgr@deploy2002: Started scap sync-world: Backport for Reader Survey: Fix yes/no messages (T378660)
21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P71150 and previous config saved to /var/cache/conftool/dbconfig/20241125-210827-ladsgroup.json
21:04 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Host reimage - brett@cumin2002 - brett@cumin2002"
21:04 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Host reimage - brett@cumin2002 - brett@cumin2002"
21:03 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase[2021-2023].codfw.wmnet
21:03 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:03 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[2021-2023].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
21:03 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[2021-2023].codfw.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
20:59 eevans@cumin1002: START - Cookbook sre.dns.netbox
20:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7008.magru.wmnet with OS bullseye
20:57 brett@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
20:56 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002"
20:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T380449)', diff saved to https://phabricator.wikimedia.org/P71149 and previous config saved to /var/cache/conftool/dbconfig/20241125-205320-ladsgroup.json
20:51 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts restbase[2021-2023].codfw.wmnet
20:51 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:51 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
20:50 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
20:45 robh@cumin2002: START - Cookbook sre.dns.netbox
20:45 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns7001
20:45 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns7001
20:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye
20:43 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs7003.magru.wmnet with OS bullseye
20:40 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
20:34 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage
20:31 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage
20:26 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs7003.magru.wmnet with OS bullseye
20:24 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7015.magru.wmnet with OS bullseye
20:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7008.magru.wmnet with reason: host reimage
20:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7008.magru.wmnet with reason: host reimage
20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2115 (T380449)', diff saved to https://phabricator.wikimedia.org/P71147 and previous config saved to /var/cache/conftool/dbconfig/20241125-200031-ladsgroup.json
20:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
20:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2115.codfw.wmnet with reason: Maintenance
20:00 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7004.magru.wmnet with OS bookworm
19:58 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7003.magru.wmnet with OS bookworm
19:58 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002"
19:56 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin2002"
19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2038.codfw.wmnet
19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2037.codfw.wmnet
19:50 eevans@cumin1002: conftool action : set/weight=10; selector: cluster=restbase,dc=codfw,name=restbase2036.codfw.wmnet
19:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye
19:43 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye
19:37 ejegg: fundraising civicrm upgraded from 3311520a to b7bd670f
19:36 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] enwiki: Deploy Add Link to 2% of new users (T377631) (duration: 11m 59s)
19:35 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage
19:31 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7003.magru.wmnet with reason: host reimage
19:29 urbanecm@deploy2002: urbanecm: Continuing with sync
19:28 urbanecm@deploy2002: urbanecm: Backport for [Growth] enwiki: Deploy Add Link to 2% of new users (T377631) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:24 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] enwiki: Deploy Add Link to 2% of new users (T377631)
19:18 swfrench@deploy2002: Finished scap sync-world: Deployment to pick up new php 8.1 base images (duration: 09m 37s)
19:14 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:14 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
19:14 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
19:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
19:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
19:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 (T380449)', diff saved to https://phabricator.wikimedia.org/P71144 and previous config saved to /var/cache/conftool/dbconfig/20241125-191124-ladsgroup.json
19:10 robh@cumin2002: START - Cookbook sre.dns.netbox
19:10 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003
19:10 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003
19:08 swfrench@deploy2002: Started scap sync-world: Deployment to pick up new php 8.1 base images
19:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye
19:06 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye
19:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7006.magru.wmnet with OS bullseye
19:02 brett@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
18:59 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7003.magru.wmnet with OS bookworm
18:59 brett@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - brett@cumin2002"
18:59 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:59 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
18:59 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
18:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P71143 and previous config saved to /var/cache/conftool/dbconfig/20241125-185617-ladsgroup.json
18:53 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs7003
18:53 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs7003
18:53 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye
18:52 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7008.magru.wmnet with OS bullseye
18:49 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{wikikube-worker[2128-2170].codfw.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-se
18:48 krinkle@deploy2002: Finished deploy [statsv/statsv@6678d4b]: I7a8d831817: remove unused statsvr.py (duration: 00m 09s)
18:48 krinkle@deploy2002: Started deploy [statsv/statsv@6678d4b]: I7a8d831817: remove unused statsvr.py
18:45 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7015
18:45 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7015
18:45 robh@cumin2002: START - Cookbook sre.dns.netbox
18:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237', diff saved to https://phabricator.wikimedia.org/P71142 and previous config saved to /var/cache/conftool/dbconfig/20241125-184110-ladsgroup.json
18:34 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7008.magru.wmnet with OS bullseye
18:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7006.magru.wmnet with reason: host reimage
18:31 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7006.magru.wmnet with reason: host reimage
18:28 robh@cumin2002: START - Cookbook sre.dns.netbox
18:28 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7015.magru.wmnet
18:28 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:27 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7015.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
18:27 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7015.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1237 (T380449)', diff saved to https://phabricator.wikimedia.org/P71141 and previous config saved to /var/cache/conftool/dbconfig/20241125-182603-ladsgroup.json
18:24 robh@cumin2002: START - Cookbook sre.dns.netbox
18:18 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7015.magru.wmnet
18:17 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts lvs7003.magru.wmnet
18:17 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:17 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
18:16 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
18:13 robh@cumin2002: START - Cookbook sre.dns.netbox
18:08 swfrench-wmf: rebuilt php8.1 production images to pick up 8.1.31
18:08 urbanecm@deploy2002: Finished scap sync-world: Backport for Migrate to virtual domains (T354939), createExtensionTables: Use virtual domains for GrowthExperiments (T354939) (duration: 13m 18s)
18:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS bullseye
18:03 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7006.magru.wmnet with OS bullseye
18:03 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs7003.magru.wmnet
18:02 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:02 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
18:02 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
18:01 urbanecm@deploy2002: urbanecm: Continuing with sync
17:59 urbanecm@deploy2002: urbanecm: Backport for Migrate to virtual domains (T354939), createExtensionTables: Use virtual domains for GrowthExperiments (T354939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:58 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti7004
17:58 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti7004
17:57 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp7008
17:57 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp7008
17:56 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS (duration: 02m 53s)
17:55 robh@cumin2002: START - Cookbook sre.dns.netbox
17:54 urbanecm@deploy2002: Started scap sync-world: Backport for Migrate to virtual domains (T354939), createExtensionTables: Use virtual domains for GrowthExperiments (T354939)
17:53 ryankemper@deploy2002: Started deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS
17:49 ryankemper: T378260 `snapshot1016.eqiad.wmnet` => manually deleted `cirrussearch-dump-s11.[timer,service]`
17:49 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7001.magru.wmnet with OS bullseye
17:49 fabfur@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
17:46 fabfur@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1002"
17:44 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp7006.magru.wmnet with OS bullseye
17:41 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp7008.magru.wmnet
17:41 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:39 robh@cumin2002: START - Cookbook sre.dns.netbox
17:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti7004.magru.wmnet
17:39 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:39 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7004.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
17:39 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7004.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
17:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1237 (T380449)', diff saved to https://phabricator.wikimedia.org/P71140 and previous config saved to /var/cache/conftool/dbconfig/20241125-173511-ladsgroup.json
17:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1237.eqiad.wmnet with reason: Maintenance
17:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1237.eqiad.wmnet with reason: Maintenance
17:34 robh@cumin2002: START - Cookbook sre.dns.netbox
17:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7008.magru.wmnet
17:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7004.magru.wmnet
17:23 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:23 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
17:22 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru reshuffle - robh@cumin2002"
17:19 robh@cumin2002: START - Cookbook sre.dns.netbox
17:17 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7001.magru.wmnet with reason: host reimage
17:14 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7001.magru.wmnet with reason: host reimage
17:10 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp7006.magru.wmnet
17:10 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:10 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7006.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
17:10 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp7006.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
17:06 robh@cumin2002: START - Cookbook sre.dns.netbox
16:59 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp7006.magru.wmnet
16:59 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti7003.magru.wmnet
16:59 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:58 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
16:58 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti7003.magru.wmnet decommissioned, removing all IPs except the asset tag one - robh@cumin2002"
16:55 robh@cumin2002: START - Cookbook sre.dns.netbox
16:49 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti7003.magru.wmnet
16:47 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7001.magru.wmnet with OS bullseye
16:45 hashar@deploy2002: Pruned MediaWiki: 1.44.0-wmf.2 (duration: 03m 05s)
16:44 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
16:42 hashar@deploy2002: Installation of scap version "4.129.0" completed for 211 hosts
16:42 swfrench-wmf: uploaded php8.1 8.1.31-1+wmf11u1 to apt.w.o (16:25 UTC)
16:38 hashar@deploy2002: Installing scap version "4.129.0" for 211 hosts
16:27 hashar@deploy2002: Installation of scap version "4.128.0" completed for 211 hosts
16:27 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
16:23 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts
16:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
16:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
16:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T380449)', diff saved to https://phabricator.wikimedia.org/P71138 and previous config saved to /var/cache/conftool/dbconfig/20241125-161915-ladsgroup.json
16:05 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:05 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{wikikube-worker[1305-1312].eqiad.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-se
16:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:04 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_drmrs and A:cp for 9.2.6-1wm2
16:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P71134 and previous config saved to /var/cache/conftool/dbconfig/20241125-160408-ladsgroup.json
16:02 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_drmrs and A:cp for 9.2.6-1wm2
15:58 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:52 Lucas_WMDE: UTC afternoon backport+config window done (apologies for the temporary flood of “Use of QuickSurveys survey” deprecation warnings – should be fixed again)
15:52 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:49 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Reader Survey: Fix question (T378660) (duration: 13m 02s)
15:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P71133 and previous config saved to /var/cache/conftool/dbconfig/20241125-154901-ladsgroup.json
15:48 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:47 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:47 robh@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru swaps - robh@cumin2002"
15:46 robh@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru swaps - robh@cumin2002"
15:46 claime: homer cr*eqiad* commit 'T380027'
15:42 robh@cumin2002: START - Cookbook sre.dns.netbox
15:41 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dani: Continuing with sync
15:41 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts kubernetes[1009-1014].eqiad.wmnet
15:41 cgoubert@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
15:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
15:40 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, dani: Backport for Reader Survey: Fix question (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm
15:37 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2011.codfw.wmnet with reason: Reboot
15:37 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2011.codfw.wmnet with reason: Reboot
15:37 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2010.codfw.wmnet with reason: Reboot
15:37 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2010.codfw.wmnet with reason: Reboot
15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Reader Survey: Fix question (T378660)
15:36 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T380449)', diff saved to https://phabricator.wikimedia.org/P71132 and previous config saved to /var/cache/conftool/dbconfig/20241125-153354-ladsgroup.json
15:31 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp7001.mgmt.magru.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:23 lucaswerkmeister-wmde@deploy2002: dani, lucaswerkmeister-wmde: Continuing with sync
15:21 lucaswerkmeister-wmde@deploy2002: dani, lucaswerkmeister-wmde: Backport for Reader Survey: Deploy on enwiki (T378660) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:19 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubernetes[1009-1014].eqiad.wmnet
15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
15:18 robh@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:17 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Reader Survey: Deploy on enwiki (T378660)
15:15 robh@cumin1002: START - Cookbook sre.dns.netbox
15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
15:15 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for New stream config for Android Rabbit Holes feature. (T380107) (duration: 15m 45s)
15:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T380449)', diff saved to https://phabricator.wikimedia.org/P71131 and previous config saved to /var/cache/conftool/dbconfig/20241125-151103-ladsgroup.json
15:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1224.eqiad.wmnet with reason: Maintenance
15:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1224.eqiad.wmnet with reason: Maintenance
15:08 lucaswerkmeister-wmde@deploy2002: dbrant, lucaswerkmeister-wmde: Continuing with sync
15:03 lucaswerkmeister-wmde@deploy2002: dbrant, lucaswerkmeister-wmde: Backport for New stream config for Android Rabbit Holes feature. (T380107) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_drmrs and A:cp for 9.2.6-1wm2
15:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_drmrs and A:cp for 9.2.6-1wm2
14:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1009-1014].eqiad.wmnet
14:59 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for New stream config for Android Rabbit Holes feature. (T380107)
14:57 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Pass context to 'revreview-pending-basic' on history page (T380519), Use Contexts for Message objects in review dialog (tooltip) (T380519) (duration: 15m 35s)
14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove vlan1107 IPv6 entries - cmooney@cumin1002"
14:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1009-1014].eqiad.wmnet
14:54 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove vlan1107 IPv6 entries - cmooney@cumin1002"
14:54 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm
14:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1309.eqiad.wmnet
14:52 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1309.eqiad.wmnet
14:52 cmooney@cumin1002: START - Cookbook sre.dns.netbox
14:50 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Continuing with sync
14:49 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-text_codfw and A:cp for 9.2.6-1wm2
14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Backport for Pass context to 'revreview-pending-basic' on history page (T380519), Use Contexts for Message objects in review dialog (tooltip) (T380519) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1310-1312].eqiad.wmnet
14:47 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1310-1312].eqiad.wmnet
14:47 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_codfw and A:cp for 9.2.6-1wm2
14:44 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:44 cmooney@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverse IPv6 includes to dns repo for vlan1107 - cmooney@cumin1002"
14:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add reverse IPv6 includes to dns repo for vlan1107 - cmooney@cumin1002"
14:41 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Pass context to 'revreview-pending-basic' on history page (T380519), Use Contexts for Message objects in review dialog (tooltip) (T380519)
14:39 cmooney@cumin1002: START - Cookbook sre.dns.netbox
14:26 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add tooltips - oblivian@cumin1002"
14:26 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips - oblivian@cumin1002
14:26 moritzm: prune unneeded kernels from grafana2001
14:26 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add tooltips - oblivian@cumin1002
14:26 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add tooltips - oblivian@cumin1002"
14:20 claime: Manually deleting wikikube-worker13[13-20].eqiad.wmnet for ip exhaustion T375845
14:19 claime: disable puppet and kubelet on wikikube-worker13[13-28].eqiad.wmnet for ip exhaustion T375845
14:12 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2045.codfw.wmnet
14:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2046.codfw.wmnet
14:02 aborrero@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:01 aborrero@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
14:01 aborrero@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw updates - aborrero@cumin1002"
13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2046.codfw.wmnet
13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2045.codfw.wmnet
13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2044.codfw.wmnet
13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2043.codfw.wmnet
13:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-upload_codfw and A:cp for 9.2.6-1wm2
13:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-text_codfw and A:cp for 9.2.6-1wm2
13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2044.codfw.wmnet
13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2043.codfw.wmnet
13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2042.codfw.wmnet
13:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host es2041.codfw.wmnet
13:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
13:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1318.eqiad.wmnet with OS bookworm
13:46 jayme: deployed sessionstore to non-dedicated nodes - T379599
13:44 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
13:44 jayme@deploy2002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
13:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1320.eqiad.wmnet with OS bookworm
13:43 jayme: cordoned kubernetes[2005-2006,2015-2016].codfw.wmnet,kubernetes[1005-1006,1015-1016].eqiad.wmnet - T379599
13:42 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
13:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2042.codfw.wmnet
13:42 aborrero@cumin1002: START - Cookbook sre.dns.netbox
13:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es2041.codfw.wmnet
13:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1314.eqiad.wmnet with OS bookworm
13:40 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on A:cephosd and (A:cephosd)
13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host db1246.eqiad.wmnet
13:38 jayme@deploy2002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
13:38 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
13:38 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
13:37 andrewtavis-wmde@deploy2002: Finished deploy [airflow-dags/wmde@006515b]: Testing the new k8s deployment (duration: 02m 34s)
13:37 andrewtavis-wmde@deploy2002: Started deploy [airflow-dags/wmde@006515b]: Testing the new k8s deployment
13:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1319.eqiad.wmnet with OS bookworm
13:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1315.eqiad.wmnet with OS bookworm
13:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host db1246.eqiad.wmnet
13:32 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
13:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1317.eqiad.wmnet with OS bookworm
13:28 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{wikikube-worker[1305-1312].eqiad.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or
13:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage
13:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1316.eqiad.wmnet with OS bookworm
13:27 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{wikikube-worker[2128-2170].codfw.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or
13:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage
13:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1313.eqiad.wmnet with OS bookworm
13:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
13:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
13:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage
13:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1179 gradually with 4 steps - Maint over
13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage
13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: T373579, host is WIP
13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: T373579, host is WIP
13:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage
13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage
13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
13:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage
13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
13:06 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage
13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7015.magru.wmnet with reason: T376737
13:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7015.magru.wmnet with reason: T376737
13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7008.magru.wmnet with reason: T376737
13:05 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7008.magru.wmnet with reason: T376737
13:05 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage
13:05 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: T376737
13:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage
13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7006.magru.wmnet with reason: T376737
13:04 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: T376737
13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: T376737
13:04 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs7003.magru.wmnet with reason: T376737
13:04 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs7003.magru.wmnet with reason: T376737
13:04 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage
13:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2005.codfw.wmnet
13:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage
13:03 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti7004.magru.wmnet with reason: T376737
13:03 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti7004.magru.wmnet with reason: T376737
13:03 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti7003.magru.wmnet with reason: T376737
13:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage
13:03 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on ganeti7003.magru.wmnet with reason: T376737
13:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
13:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dns7001.wikimedia.org with reason: T376737
13:02 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
13:02 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dns7001.wikimedia.org with reason: T376737
13:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage
13:02 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: T376737
13:02 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: T376737
13:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage
13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2004.codfw.wmnet
13:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
13:01 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
13:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage
13:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2005.codfw.wmnet
12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
12:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-etcd2003.codfw.wmnet
12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
12:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
12:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2003.codfw.wmnet
12:58 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on D{kubestage100[5-6].eqiad.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-maste
12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2004.codfw.wmnet
12:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
12:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
12:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-etcd2003.codfw.wmnet
12:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2003.codfw.wmnet
12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-ctrl2002.codfw.wmnet
12:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2005.codfw.wmnet
12:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:47 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup1011.eqiad.wmnet with reason: Reboot
12:47 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup1011.eqiad.wmnet with reason: Reboot
12:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-ctrl2002.codfw.wmnet
12:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2005.codfw.wmnet
12:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1320.eqiad.wmnet with OS bookworm
12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1319.eqiad.wmnet with OS bookworm
12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1318.eqiad.wmnet with OS bookworm
12:43 jayme@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on D{kubestage100[5-6].eqiad.wmnet} and (A:wikikube-staging-worker-codfw or A:wikikube-staging-master-codfw or A:wikikube-staging-worker-eqiad or A:wikikube-staging-master-eqiad or A:wikikube-worker-codfw or A:wikikube-master-codfw or A:wikikube-worker-eqiad or A:wikikube-master-eqiad or A:ml-serve-worker-eqiad or A:ml-serve-master-eqiad or A:ml-ser
12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1317.eqiad.wmnet with OS bookworm
12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1316.eqiad.wmnet with OS bookworm
12:42 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup1010.eqiad.wmnet with reason: Reboot
12:41 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup1010.eqiad.wmnet with reason: Reboot
12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1315.eqiad.wmnet with OS bookworm
12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1314.eqiad.wmnet with OS bookworm
12:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1313.eqiad.wmnet with OS bookworm
12:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and not (P{cp5018.*} or P{cp5026.*}) and A:cp for 9.2.6-1wm2
12:32 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1179 gradually with 4 steps - Maint over
12:28 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on A:cephosd and (A:cephosd)
12:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2004.codfw.wmnet
12:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2003.codfw.wmnet
12:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2004.codfw.wmnet
12:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2003.codfw.wmnet
12:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host aux-k8s-worker2002.codfw.wmnet
12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2002.codfw.wmnet
12:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host aux-k8s-worker2002.codfw.wmnet
12:06 hashar@deploy2002: Pruned MediaWiki: 1.39.0-wmf.1 (duration: 00m 40s)
12:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2002.codfw.wmnet
12:03 hashar@deploy2002: Pruned MediaWiki: 1.39.0-wmf.1 (duration: 00m 37s)
11:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1256.eqiad.wmnet
11:56 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1256.eqiad.wmnet
11:51 hashar@deploy2002: Installation of scap version "4.128.0" completed for 211 hosts
11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker1290.eqiad.wmnet
11:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker1290.eqiad.wmnet
11:47 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts
11:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1179 (T380449)', diff saved to https://phabricator.wikimedia.org/P71125 and previous config saved to /var/cache/conftool/dbconfig/20241125-114651-ladsgroup.json
11:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
11:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1179.eqiad.wmnet with reason: Maintenance
11:41 claime: homer 'cr*eqiad*' commit 'T379454'
11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1256.eqiad.wmnet with OS bookworm
11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002"
11:39 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002"
11:34 hashar@deploy2002: Installing scap version "4.128.0" for 211 hosts
11:24 moritzm: installing Linux 6.1.119 on Bookworm nodes
11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
11:18 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1256.eqiad.wmnet with reason: host reimage
11:03 fabfur@cumin1002: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=magru
11:02 fabfur: depooling dnsboxes @ magru for hardware swap (T376737)
11:02 fabfur@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site magru [reason: depool magru for hw swap, T376737]
11:01 fabfur@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site magru [reason: depool magru for hw swap, T376737]
11:01 fabfur: depooling magru for hardware swap (T376737)
10:40 hashar@deploy2002: Finished deploy [integration/docroot@d585f2b]: build: Updating cross-spawn to 7.0.6 (duration: 00m 10s)
10:40 hashar@deploy2002: Started deploy [integration/docroot@d585f2b]: build: Updating cross-spawn to 7.0.6
10:38 _joe_: deleted pyall component from reprepro
10:35 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-eqsin and not (P{cp5018.*} or P{cp5026.*}) and A:cp for 9.2.6-1wm2
10:25 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not (P{cp4043.*} or P{cp4051.*}) and A:cp for 9.2.6-1wm2
10:17 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1005.eqiad.wmnet
10:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:07 jynus: extending backup1009 free filesystem
10:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1005.eqiad.wmnet
09:58 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2005.codfw.wmnet
09:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet
09:45 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2005.codfw.wmnet
09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:39 moritzm: remove ganeti7003 from active Ganeti nodes in magru01 T376737
09:34 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2005.codfw.wmnet
09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:25 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump ratio of new parsercache key spec to 6 (T373037) (duration: 11m 05s)
09:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7001.wikimedia.org to plain
09:18 ladsgroup@deploy2002: ladsgroup: Continuing with sync
09:18 ladsgroup@deploy2002: ladsgroup: Backport for Bump ratio of new parsercache key spec to 6 (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7001.wikimedia.org to plain
09:13 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump ratio of new parsercache key spec to 6 (T373037)
09:13 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
09:04 kostajh: UTC morning deploys done
09:01 kharlan@deploy2002: Finished scap sync-world: Backport for IPReputation: Enable everywhere (T360067) (duration: 15m 48s)
08:53 kharlan@deploy2002: kharlan: Continuing with sync
08:50 kharlan@deploy2002: kharlan: Backport for IPReputation: Enable everywhere (T360067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7001.magru.wmnet to plain
08:47 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7001.magru.wmnet to plain
08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2179.codfw.wmnet with reason: Maintenance
08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2179.codfw.wmnet with reason: Maintenance
08:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance
08:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1160.eqiad.wmnet with reason: Maintenance
08:46 kharlan@deploy2002: Started scap sync-world: Backport for IPReputation: Enable everywhere (T360067)
08:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T367781)', diff saved to https://phabricator.wikimedia.org/P71123 and previous config saved to /var/cache/conftool/dbconfig/20241125-084531-arnaudb.json
08:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of install7001.wikimedia.org to plain
08:39 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of install7001.wikimedia.org to plain
08:39 tgr@deploy2002: Finished scap sync-world: Backport for Disable more extensions when using the shared login domain (T373737) (duration: 30m 35s)
08:37 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:cp-ulsfo and not (P{cp4043.*} or P{cp4051.*}) and A:cp for 9.2.6-1wm2
08:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P71122 and previous config saved to /var/cache/conftool/dbconfig/20241125-083024-arnaudb.json
08:30 tgr@deploy2002: tgr: Continuing with sync
08:25 tgr@deploy2002: tgr: Backport for Disable more extensions when using the shared login domain (T373737) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:17 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
08:17 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
08:17 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
08:16 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
08:16 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
08:15 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
08:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P71121 and previous config saved to /var/cache/conftool/dbconfig/20241125-081517-arnaudb.json
08:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7001.magru.wmnet to plain
08:10 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7001.magru.wmnet to plain
08:08 tgr@deploy2002: Started scap sync-world: Backport for Disable more extensions when using the shared login domain (T373737)
08:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow7001.magru.wmnet to plain
08:00 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow7001.magru.wmnet to plain
08:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T367781)', diff saved to https://phabricator.wikimedia.org/P71120 and previous config saved to /var/cache/conftool/dbconfig/20241125-080010-arnaudb.json
07:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2240 (T367781)', diff saved to https://phabricator.wikimedia.org/P71119 and previous config saved to /var/cache/conftool/dbconfig/20241125-075758-arnaudb.json
07:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance
07:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2240.codfw.wmnet with reason: Maintenance
07:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
07:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
07:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
07:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
07:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
07:47 moritzm: remove ganeti7004 from active Ganeti nodes in magru02 T376737
07:15 _joe_: upgrading vopsbot to 0.3.9

2024-11-23

12:08 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop test cluster: Restart of jvm daemons.
12:05 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
02:15 urandom: decommissioning Cassandra/restbase2023-{a,b,c} — T380236

2024-11-22

21:51 bking@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=wdqs-internal-scholarly,name=eqiad
21:37 bking@cumin2002: conftool action : set/pooled=yes; selector: name=wdqs2026.codfw.wmnet
21:37 bking@cumin2002: conftool action : set/pooled=yes; selector: name=wdqs2018.codfw.wmnet
21:33 bking@cumin2002: conftool action : set/weight=1; selector: name=wdqs2026.codfw.wmnet
21:33 bking@cumin2002: conftool action : set/weight=1; selector: name=wdqs2018.codfw.wmnet
21:25 bking@cumin2002: conftool action : set/pooled=yes:weight=1; selector: cluster=wdqs-scholarly,service=wdqs-internal-scholarly
21:25 bking@cumin2002: conftool action : set/pooled=yes:weight=1; selector: cluster=wdqs-main,service=wdqs-internal-main
20:59 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2005.codfw.wmnet
20:59 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2005.codfw.wmnet with OS bookworm
20:41 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2005.codfw.wmnet with reason: host reimage
20:37 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2005.codfw.wmnet with reason: host reimage
20:20 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2005.codfw.wmnet with OS bookworm
20:17 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002"
20:17 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002"
20:17 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2005.codfw.wmnet on all recursors
20:17 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2005.codfw.wmnet on all recursors
20:17 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:17 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002"
20:17 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2005.codfw.wmnet - herron@cumin1002"
20:07 herron@cumin1002: START - Cookbook sre.dns.netbox
20:07 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2005.codfw.wmnet
19:47 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2004.codfw.wmnet
19:47 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2004.codfw.wmnet with OS bookworm
19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2045.codfw.wmnet with OS bookworm
19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2046.codfw.wmnet with OS bookworm
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2043.codfw.wmnet with OS bookworm
19:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:31 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2004.codfw.wmnet with reason: host reimage
19:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:27 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2004.codfw.wmnet with reason: host reimage
19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2044.codfw.wmnet with OS bookworm
19:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2045.codfw.wmnet with reason: host reimage
19:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2046.codfw.wmnet with reason: host reimage
19:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2043.codfw.wmnet with reason: host reimage
19:13 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2004.codfw.wmnet with OS bookworm
19:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002"
19:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002"
19:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2004.codfw.wmnet on all recursors
19:10 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2004.codfw.wmnet on all recursors
19:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002"
19:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2004.codfw.wmnet - herron@cumin1002"
19:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2044.codfw.wmnet with reason: host reimage
19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2045.codfw.wmnet with reason: host reimage
19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2046.codfw.wmnet with reason: host reimage
19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2043.codfw.wmnet with reason: host reimage
19:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2044.codfw.wmnet with reason: host reimage
18:58 herron@cumin1002: START - Cookbook sre.dns.netbox
18:58 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2004.codfw.wmnet
18:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2042.codfw.wmnet with OS bookworm
18:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2043.codfw.wmnet with OS bookworm
18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2044.codfw.wmnet with OS bookworm
18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2045.codfw.wmnet with OS bookworm
18:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2046.codfw.wmnet with OS bookworm
18:45 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2003.codfw.wmnet
18:45 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2003.codfw.wmnet with OS bookworm
18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2042.codfw.wmnet with reason: host reimage
18:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2042.codfw.wmnet with reason: host reimage
18:31 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2003.codfw.wmnet with reason: host reimage
18:27 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2003.codfw.wmnet with reason: host reimage
18:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2042.codfw.wmnet with OS bookworm
18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
18:11 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2003.codfw.wmnet with OS bookworm
18:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002"
18:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002"
18:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2003.codfw.wmnet on all recursors
18:10 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2003.codfw.wmnet on all recursors
18:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002"
18:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2003.codfw.wmnet - herron@cumin1002"
18:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
18:03 herron@cumin1002: START - Cookbook sre.dns.netbox
18:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2042 to codfw - jhancock@cumin2002"
18:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2042 to codfw - jhancock@cumin2002"
18:02 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2003.codfw.wmnet
17:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
17:41 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker2002.codfw.wmnet
17:41 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker2002.codfw.wmnet with OS bookworm
17:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2042
17:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host es2042
17:25 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker2002.codfw.wmnet with reason: host reimage
17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudsw1-d5-eqiad.mgmt,cloudsw1-e4-eqiad.mgmt with reason: replace optics on faulty WMCS link from D5 to E4
17:22 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudsw1-d5-eqiad.mgmt,cloudsw1-e4-eqiad.mgmt with reason: replace optics on faulty WMCS link from D5 to E4
17:22 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker2002.codfw.wmnet with reason: host reimage
17:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:11 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:08 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker2002.codfw.wmnet with OS bookworm
17:06 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002"
17:06 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002"
17:05 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker2002.codfw.wmnet on all recursors
17:05 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker2002.codfw.wmnet on all recursors
17:05 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:05 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002"
17:05 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker2002.codfw.wmnet - herron@cumin1002"
17:00 herron@cumin1002: START - Cookbook sre.dns.netbox
17:00 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker2002.codfw.wmnet
16:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:54 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2003.codfw.wmnet to plain
16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:53 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2003.codfw.wmnet to plain
16:48 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2004.codfw.wmnet to plain
16:47 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2004.codfw.wmnet to plain
16:43 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd2005.codfw.wmnet to plain
16:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2041.codfw.wmnet with OS bookworm
16:43 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
16:43 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
16:42 herron@cumin1002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd2005.codfw.wmnet to plain
16:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:27 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2041.codfw.wmnet with reason: host reimage
16:24 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2041.codfw.wmnet with reason: host reimage
16:12 claime: homer 'cr*codfw*' commit 'T380473'
16:11 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts parse[2002-2020].codfw.wmnet
16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse[2002-2020].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
16:10 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse[2002-2020].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
16:09 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
16:08 bking@deploy2002: Finished deploy [wdqs/wdqs@9927a5a]: 0.3.150 (duration: 03m 00s)
16:07 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
16:05 bking@deploy2002: Started deploy [wdqs/wdqs@9927a5a]: 0.3.150
16:00 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts parse[2002-2020].codfw.wmnet
15:31 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts parse2001.codfw.wmnet
15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
15:29 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: parse2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
15:29 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host es2041.codfw.wmnet with OS bookworm
15:25 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
15:22 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
15:20 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts parse2001.codfw.wmnet
15:17 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
15:17 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
15:16 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
15:15 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
15:14 claime: kubectl delete node parse20{01..20}.codfw.wmnet - T380473
15:12 claime: parse[2001-2020].codfw.wmnet 'systemctl stop kubelet.service' - T380473
15:11 claime: parse[2001-2020].codfw.wmnet 'disable-puppet "decom"' - T380473
15:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host parse[2001-2020].codfw.wmnet
15:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wdqs[2018-2020].codfw.wmnet with reason: T379023
15:02 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wdqs[2018-2020].codfw.wmnet with reason: T379023
15:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T379023
15:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wdqs[2026-2027].codfw.wmnet with reason: T379023
14:54 urandom: decommissioning Cassandra/restbase2022-{a,b,c} —
14:53 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — T380236
14:53 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — T380236
14:49 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host parse[2001-2020].codfw.wmnet
14:37 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
14:27 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
14:23 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
14:22 vgutierrez: restoring haproxykafka on A:cp-ulsfo and A:cp-eqsin - T380570
14:13 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
14:12 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
14:12 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2156-2170].codfw.wmnet
11:26 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2156-2170].codfw.wmnet
11:25 claime: homer 'lsw1-d7-codfw*' commit 'T376966'
11:24 claime: homer 'lsw1-d6-codfw*' commit 'T376966'
11:24 claime: homer 'lsw1-d5-codfw*' commit 'T376966'
11:23 claime: homer 'lsw1-d4-codfw*' commit 'T376966'
11:22 claime: homer 'lsw1-d1-codfw*' commit 'T376966'
11:21 claime: homer 'lsw1-c7-codfw*' commit 'T376966'
11:20 claime: homer 'lsw1-c4-codfw*' commit 'T376966'
11:19 claime: homer 'lsw1-c2-codfw*' commit 'T376966'
11:19 claime: homer 'lsw1-b7-codfw*' commit 'T376966'
11:18 claime: homer 'lsw1-b4-codfw*' commit 'T376966'
11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2140.codfw.wmnet
11:07 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2140.codfw.wmnet
11:04 claime: homer 'lsw1-b7-codfw*' commit 'T377028'
11:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm
10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage
10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage
10:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1014.eqiad.wmnet
10:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:37 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:26 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1014.eqiad.wmnet
10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1011.eqiad.wmnet
10:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1011.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:22 vgutierrez: manually stopping haproxykafka on A:cp-ulsfo and A:cp-eqsin - T380570
10:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm
10:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:10 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1011.eqiad.wmnet
08:08 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add sorting options to tree view - oblivian@cumin1002"
08:08 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add sorting options to tree view - oblivian@cumin1002
08:07 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add sorting options to tree view - oblivian@cumin1002
08:07 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add sorting options to tree view - oblivian@cumin1002"
01:00 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2005.codfw.wmnet
01:00 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2005.codfw.wmnet with OS bookworm
00:46 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2005.codfw.wmnet with reason: host reimage
00:42 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2005.codfw.wmnet with reason: host reimage
00:27 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2005.codfw.wmnet with OS bookworm
00:20 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002"
00:20 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002"
00:20 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2005.codfw.wmnet on all recursors
00:20 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2005.codfw.wmnet on all recursors
00:20 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:20 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002"
00:16 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2005.codfw.wmnet - herron@cumin1002"
00:11 herron@cumin1002: START - Cookbook sre.dns.netbox
00:11 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2005.codfw.wmnet
00:11 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2004.codfw.wmnet
00:11 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2004.codfw.wmnet with OS bookworm

2024-11-21

23:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2004.codfw.wmnet with reason: host reimage
23:52 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2004.codfw.wmnet with reason: host reimage
23:36 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2004.codfw.wmnet with OS bookworm
23:29 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002"
23:29 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002"
23:29 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2004.codfw.wmnet on all recursors
23:28 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2004.codfw.wmnet on all recursors
23:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002"
23:24 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2004.codfw.wmnet - herron@cumin1002"
23:11 herron@cumin1002: START - Cookbook sre.dns.netbox
23:11 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2004.codfw.wmnet
23:09 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd2003.codfw.wmnet
23:09 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-etcd2003.codfw.wmnet with OS bookworm
23:08 brennen: end of utc late backport & config window
23:07 brennen@deploy2002: Finished scap sync-world: Backport for Add statsv to charts impressions (T379833) (duration: 12m 08s)
23:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
23:01 brennen@deploy2002: bvibber, brennen: Continuing with sync
23:00 brennen@deploy2002: bvibber, brennen: Backport for Add statsv to charts impressions (T379833) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:55 brennen@deploy2002: Started scap sync-world: Backport for Add statsv to charts impressions (T379833)
22:55 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-etcd2003.codfw.wmnet with reason: host reimage
22:54 brennen@deploy2002: Finished scap sync-world: resuming sync for Add tracking categories for {{#chart:}} usage (T369684) after messing up a keypress (duration: 12m 35s)
22:52 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-etcd2003.codfw.wmnet with reason: host reimage
22:42 brennen@deploy2002: Started scap sync-world: resuming sync for Add tracking categories for {{#chart:}} usage (T369684) after messing up a keypress
22:40 brennen@deploy2002: Sync cancelled.
22:40 brennen@deploy2002: bvibber, brennen: Backport for Add tracking categories for {{#chart:}} usage (T369684) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:38 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-etcd2003.codfw.wmnet with OS bookworm
22:36 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002"
22:36 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002"
22:35 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd2003.codfw.wmnet on all recursors
22:35 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd2003.codfw.wmnet on all recursors
22:35 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:35 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002"
22:35 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-etcd2003.codfw.wmnet - herron@cumin1002"
22:32 herron@cumin1002: START - Cookbook sre.dns.netbox
22:32 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd2003.codfw.wmnet
22:25 brennen@deploy2002: Started scap sync-world: Backport for Add tracking categories for {{#chart:}} usage (T369684)
22:25 brennen@deploy2002: Finished scap sync-world: Backport for Disable various extensions when using the shared login domain (T373737) (duration: 18m 16s)
22:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
22:18 brennen@deploy2002: tgr, brennen: Continuing with sync
22:10 brennen@deploy2002: tgr, brennen: Backport for Disable various extensions when using the shared login domain (T373737) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:06 brennen@deploy2002: Started scap sync-world: Backport for Disable various extensions when using the shared login domain (T373737)
22:05 brennen@deploy2002: Finished scap sync-world: Backport for Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165) (duration: 10m 34s)
21:58 brennen@deploy2002: brennen: Continuing with sync
21:58 brennen@deploy2002: brennen: Backport for Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:54 brennen@deploy2002: Started scap sync-world: Backport for Revert "Reduce number of bucketsizes for MediaViewer (group0)" (T372165)
21:51 brennen@deploy2002: Sync cancelled.
21:42 brennen@deploy2002: brennen, tgr, simon04: Backport for Reduce number of bucketsizes for MediaViewer (group0) (T372165), Set 'remember' central session object field when recreating (T379254 T372702), Use cookie to access central session when local session expired synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:39 brennen@deploy2002: Started scap sync-world: Backport for Reduce number of bucketsizes for MediaViewer (group0) (T372165), Set 'remember' central session object field when recreating (T379254 T372702), Use cookie to access central session when local session expired
21:36 brennen@deploy2002: Finished scap sync-world: Backport for Enable Skin-Codex logging (T375287) (duration: 15m 53s)
21:29 brennen@deploy2002: brennen, jdlrobson: Continuing with sync
21:26 brennen@deploy2002: brennen, jdlrobson: Backport for Enable Skin-Codex logging (T375287) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:20 brennen@deploy2002: Started scap sync-world: Backport for Enable Skin-Codex logging (T375287)
21:19 brennen@deploy2002: Finished scap sync-world: Backport for Enable AutoModerator on afwiki (T376597) (duration: 13m 50s)
21:12 brennen@deploy2002: kgraessle, brennen: Continuing with sync
21:10 brennen@deploy2002: kgraessle, brennen: Backport for Enable AutoModerator on afwiki (T376597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:05 brennen@deploy2002: Started scap sync-world: Backport for Enable AutoModerator on afwiki (T376597)
20:46 tgr
20:24 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2038.codfw.wmnet [reason: DIMM replaced, T308459]
20:20 sukhe: force agent on cp2038
19:31 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@199401a6] (duration: 03m 45s)
19:27 gmodena@deploy2002: Started deploy [analytics/refinery@199401a] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@199401a6]
19:07 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a] (thin): Ad-hoc deployment THIN [analytics/refinery@199401a6] (duration: 05m 37s)
19:01 gmodena@deploy2002: Started deploy [analytics/refinery@199401a] (thin): Ad-hoc deployment THIN [analytics/refinery@199401a6]
18:57 gmodena@deploy2002: Finished deploy [analytics/refinery@199401a]: Ad-hoc deployment [analytics/refinery@199401a6] (duration: 14m 08s)
18:57 cdanis@deploy2002: Finished scap sync-world: Backport for Follow-up fix for Charts enable on commons/test2 (T379689) (duration: 11m 29s)
18:49 cdanis@deploy2002: cdanis, bvibber: Continuing with sync
18:49 cdanis@deploy2002: cdanis, bvibber: Backport for Follow-up fix for Charts enable on commons/test2 (T379689) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:45 cdanis@deploy2002: Started scap sync-world: Backport for Follow-up fix for Charts enable on commons/test2 (T379689)
18:43 gmodena@deploy2002: Started deploy [analytics/refinery@199401a]: Ad-hoc deployment [analytics/refinery@199401a6]
18:21 cdanis@deploy2002: Finished scap sync-world: Backport for Enabling Charts on commons+test2 (T379689) (duration: 14m 05s)
18:16 jayme@cumin2002: conftool action : set/pooled=yes; selector: name=kubestage200[34].codfw.wmnet
18:15 jayme@cumin2002: conftool action : set/weight=10; selector: name=kubestage200[34].codfw.wmnet
18:13 cdanis@deploy2002: cdanis, bvibber: Continuing with sync
18:12 cdanis@deploy2002: cdanis, bvibber: Backport for Enabling Charts on commons+test2 (T379689) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:10 sukhe: running puppet on A:cp to resolve failed puppet run
18:10 sukhe: sudo cumin -b11 'A:cp' 'run-puppet-agent
18:09 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cp2038.codfw.wmnet with reason: DIMM replacement in progress
18:09 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cp2038.codfw.wmnet with reason: DIMM replacement in progress
18:07 cdanis@deploy2002: Started scap sync-world: Backport for Enabling Charts on commons+test2 (T379689)
17:58 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2038.codfw.wmnet [reason: DIMM failure T308459]
17:45 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) check for host kubestage2003.codfw.wmnet
17:45 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node check for host kubestage2003.codfw.wmnet
17:40 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts clouddb2002-dev.codfw.wmnet
17:40 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:40 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
17:39 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: clouddb2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
17:39 fabfur: adding acls to kafka-jumbo cluster (T380373)
17:36 andrew@cumin1002: START - Cookbook sre.dns.netbox
17:31 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts clouddb2002-dev.codfw.wmnet
17:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm
16:54 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet
16:54 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet
16:54 sukhe: enable puppet on lvs2013 and start pybal
16:48 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting
16:47 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting
16:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm
16:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002"
16:46 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs2013.codfw.wmnet
16:46 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1002"
16:43 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs2013.codfw.wmnet
16:43 sukhe: rebooting drained lvs2013
16:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage
16:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage
16:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage
16:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage
16:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm
16:20 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2157.codfw.wmnet with OS bookworm
16:13 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cluster=dnsbox,dc=magru [reason: testing]
16:08 dancy@deploy2002: Finished scap sync-world: testing (duration: 03m 01s)
16:05 dancy@deploy2002: Started scap sync-world: testing
16:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
16:03 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm
16:00 dancy@deploy2002: Installing scap version "4.127.0" for 209 hosts
15:39 kartik@deploy2002: Finished scap sync-world: Backport for Fix layout broken by display:flex on HorizontalLayout (T380471), Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id" (duration: 15m 51s)
15:34 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@358ccf55] (duration: 03m 30s)
15:33 kartik@deploy2002: abi, sgimeno, kartik: Continuing with sync
15:31 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5] (hadoop-test): Ad-hoc deployment TEST [analytics/refinery@358ccf55]
15:29 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5] (thin): Ad-hoc deployment THIN [analytics/refinery@358ccf55] (duration: 05m 16s)
15:29 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
15:29 kartik@deploy2002: abi, sgimeno, kartik: Backport for Fix layout broken by display:flex on HorizontalLayout (T380471), Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:28 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
15:28 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
15:27 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
15:26 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@6183645]: increase driver memory for mjolnir feature selection (duration: 00m 31s)
15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting
15:25 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: rebooting
15:25 ebernhardson@deploy2002: Started deploy [airflow-dags/search@6183645]: increase driver memory for mjolnir feature selection
15:24 sukhe: stop pybal on lvs2013 to confirm changes in CR 1091243
15:24 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5] (thin): Ad-hoc deployment THIN [analytics/refinery@358ccf55]
15:24 kartik@deploy2002: Started scap sync-world: Backport for Fix layout broken by display:flex on HorizontalLayout (T380471), Revert "ExperimentUserDefaultsManager: use read latest when retrieving central id"
15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
15:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm
15:16 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
15:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm
15:11 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — T380236
15:10 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — T380236
15:06 gmodena@deploy2002: Finished deploy [analytics/refinery@358ccf5]: Ad-hoc deployment [analytics/refinery@358ccf55] (duration: 11m 44s)
14:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm
14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
14:54 gmodena@deploy2002: Started deploy [analytics/refinery@358ccf5]: Ad-hoc deployment [analytics/refinery@358ccf55]
14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm
14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm
14:50 sergi0: UTC afternoon deploys done
14:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm
14:48 sgimeno@deploy2002: Sync cancelled.
14:47 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm
14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm
14:43 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on kafka-main1001.eqiad.wmnet with reason: Per claime's recommendation
14:43 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on kafka-main1001.eqiad.wmnet with reason: Per claime's recommendation
14:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm
14:41 sgimeno@deploy2002: sgimeno: Backport for ExperimentUserDefaultsManager: use read latest when retrieving central id (T379682) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
14:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage
14:35 sgimeno@deploy2002: Started scap sync-world: Backport for ExperimentUserDefaultsManager: use read latest when retrieving central id (T379682)
14:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage
14:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage
14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage
14:25 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
14:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage
14:25 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage
14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage
14:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage
14:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage
14:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage
14:21 sgimeno@deploy2002: Finished scap sync-world: Backport for enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332) (duration: 13m 50s)
14:14 sgimeno@deploy2002: eggroll97, sgimeno: Continuing with sync
14:11 sgimeno@deploy2002: eggroll97, sgimeno: Backport for enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:11 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1006.eqiad.wmnet with OS bookworm
14:07 sgimeno@deploy2002: Started scap sync-world: Backport for enwiki: Add abusefilter-access-protected-vars to EFH/EFM (T380332)
14:06 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1005.eqiad.wmnet with OS bookworm
14:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm
14:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm
14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm
14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm
14:03 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm
13:54 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage
13:51 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage
13:47 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage
13:44 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage
13:34 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage1006.eqiad.wmnet with OS bookworm
13:33 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1008 to kubestage1006
13:32 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubestage1006
13:31 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubestage1006
13:31 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:31 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1008 to kubestage1006 - jayme@cumin2002"
13:30 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1008 to kubestage1006 - jayme@cumin2002"
13:27 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage1005.eqiad.wmnet with OS bookworm
13:25 jayme@cumin2002: START - Cookbook sre.dns.netbox
13:25 jayme@cumin2002: START - Cookbook sre.hosts.rename from kubernetes1008 to kubestage1006
13:24 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from kubernetes1007 to kubestage1005
13:24 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host kubestage1005
13:22 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host kubestage1005
13:22 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:22 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1007 to kubestage1005 - jayme@cumin2002"
13:21 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming kubernetes1007 to kubestage1005 - jayme@cumin2002"
13:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm
13:18 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp5026*} and A:cp for 9.2.6-1wm2
13:17 jayme@cumin2002: START - Cookbook sre.dns.netbox
13:17 jayme@cumin2002: START - Cookbook sre.hosts.rename from kubernetes1007 to kubestage1005
13:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm
13:14 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp5026*} and A:cp for 9.2.6-1wm2
13:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp5018*} and A:cp for 9.2.6-1wm2
13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm
13:10 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp5018*} and A:cp for 9.2.6-1wm2
13:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm
13:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2163.codfw.wmnet with OS bookworm
13:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm
12:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm
12:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage
12:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm
12:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage
12:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage
12:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage
12:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage
12:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage
12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage
12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage
12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage
12:38 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage
12:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage
12:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage
12:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage
12:35 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage
12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage
12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage
12:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm
12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm
12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm
12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm
12:16 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm
12:16 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
12:13 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm
12:13 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm
12:09 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
12:09 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
12:02 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:56 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
11:56 jmm@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
11:00 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1005.eqiad.wmnet with OS bullseye
11:00 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
10:59 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
10:41 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubernetes[1007-1008].eqiad.wmnet
10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1005.eqiad.wmnet with reason: host reimage
10:40 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host kubernetes[1007-1008].eqiad.wmnet
10:39 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
10:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71113 and previous config saved to /var/cache/conftool/dbconfig/20241121-103834-arnaudb.json
10:38 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
10:38 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
10:37 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1005.eqiad.wmnet with reason: host reimage
10:36 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
10:34 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
10:33 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
10:25 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1005.eqiad.wmnet with OS bullseye
10:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71112 and previous config saved to /var/cache/conftool/dbconfig/20241121-102328-arnaudb.json
10:19 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 102
10:19 ayounsi@cumin1002: START - Cookbook sre.network.debug for Netbox circuit ID 102
10:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71111 and previous config saved to /var/cache/conftool/dbconfig/20241121-100821-arnaudb.json
10:01 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
10:01 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
09:59 dcausse: restarting eventgate-main@codfw
09:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71110 and previous config saved to /var/cache/conftool/dbconfig/20241121-095313-arnaudb.json
09:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71109 and previous config saved to /var/cache/conftool/dbconfig/20241121-095102-arnaudb.json
09:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
09:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
09:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
09:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
09:35 moritzm: installing nghttp2 security updates
09:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bookworm
09:17 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.4 refs T375663
09:07 moritzm: installing exim4 security updates
09:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
09:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
08:45 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bookworm
08:21 kartik@deploy2002: Finished scap sync-world: Backport for Enable the Contribute menu in 4th group of Wikis (T375303) (duration: 14m 05s)
08:14 kartik@deploy2002: kartik: Continuing with sync
08:10 kartik@deploy2002: kartik: Backport for Enable the Contribute menu in 4th group of Wikis (T375303) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:06 kartik@deploy2002: Started scap sync-world: Backport for Enable the Contribute menu in 4th group of Wikis (T375303)
07:48 moritzm: removing ganeti1017 from active Ganeti nodes T378921
05:51 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
02:30 brett: Import libvmod-re2_2.0.0-2~bpo11u1 into varnish-staging apt component
00:45 urandom: decommissioning Cassandra/restbase2021-{a,b,c} — T380236
00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2023.codfw.wmnet with reason: Decommissioning — T380236
00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2023.codfw.wmnet with reason: Decommissioning — T380236
00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — T380236
00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2022.codfw.wmnet with reason: Decommissioning — T380236
00:42 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — T380236
00:42 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2021.codfw.wmnet with reason: Decommissioning — T380236
00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2038.codfw.wmnet
00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2038.codfw.wmnet
00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2037.codfw.wmnet
00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2037.codfw.wmnet
00:40 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase2036.codfw.wmnet
00:40 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase2036.codfw.wmnet
00:15 urbanecm: [urbanecm@deploy2002 ~]$ mwscript-k8s -- extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose # T380329

2024-11-20

23:22 cjming: end of UTC late backport window
23:20 eileen: civicrm upgraded from 7c940d6f to 3311520a
23:17 cjming@deploy2002: Finished scap sync-world: Backport for Temporarily disable dark mode for anonymous users (T379765) (duration: 13m 06s)
23:10 cjming@deploy2002: jdlrobson, cjming: Continuing with sync
23:08 cjming@deploy2002: jdlrobson, cjming: Backport for Temporarily disable dark mode for anonymous users (T379765) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:04 cjming@deploy2002: Started scap sync-world: Backport for Temporarily disable dark mode for anonymous users (T379765)
23:03 cjming@deploy2002: Finished scap sync-world: Backport for knwiki: update portal namespace (T380366) (duration: 12m 17s)
22:56 cjming@deploy2002: cjming, anzx: Continuing with sync
22:55 cjming@deploy2002: cjming, anzx: Backport for knwiki: update portal namespace (T380366) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:52 brett: Import libvmod-querysort 0.4-3 into varnish-staging apt component
22:51 cjming@deploy2002: Started scap sync-world: Backport for knwiki: update portal namespace (T380366)
22:49 cjming@deploy2002: Finished scap sync-world: Backport for Revert "Add contact form for U4C" (duration: 14m 22s)
22:49 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2005.codfw.wmnet with OS bullseye
22:41 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
22:41 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
22:40 cjming@deploy2002: trainbranchbot, cjming: Continuing with sync
22:40 cjming@deploy2002: trainbranchbot, cjming: Backport for Revert "Add contact form for U4C" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:39 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
22:39 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
22:34 cjming@deploy2002: Started scap sync-world: Backport for Revert "Add contact form for U4C"
22:31 cjming@deploy2002: Sync cancelled.
22:28 cjming@deploy2002: nmw03, cjming: Backport for Add contact form for U4C (T379317) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:27 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
22:24 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
22:23 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
22:22 cjming@deploy2002: Started scap sync-world: Backport for Add contact form for U4C (T379317)
22:21 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
22:20 cjming@deploy2002: Finished scap sync-world: Backport for Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333), Bump wikimedia/parsoid to 0.21.0-a7 (T380333) (duration: 17m 11s)
22:18 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
22:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
22:13 cjming@deploy2002: arlolra, cjming: Continuing with sync
22:12 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2005.codfw.wmnet with OS bullseye
22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin2002"
22:09 jhathaway@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhathaway@cumin2002"
22:08 cjming@deploy2002: arlolra, cjming: Backport for Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333), Bump wikimedia/parsoid to 0.21.0-a7 (T380333) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:06 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
22:03 cjming@deploy2002: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.21.0-a7 (T373776 T380333), Bump wikimedia/parsoid to 0.21.0-a7 (T380333)
22:02 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
21:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
21:50 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
21:47 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
21:43 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
21:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
21:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
21:31 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye
21:28 cjming@deploy2002: Finished scap sync-world: Backport for [ptwiki] Enable the CampaignEvents extension (T380090) (duration: 15m 04s)
21:23 eileen: * civicrm upgraded from e29243f0 to 7c940d6f
21:20 cjming@deploy2002: cjming, albertoleoncio: Continuing with sync
21:19 cjming@deploy2002: cjming, albertoleoncio: Backport for [ptwiki] Enable the CampaignEvents extension (T380090) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:13 cjming@deploy2002: Started scap sync-world: Backport for [ptwiki] Enable the CampaignEvents extension (T380090)
21:08 dancy@deploy2002: Installing scap version "4.124.0" for 209 hosts
21:06 dancy@deploy2002: Installing scap version "4.124.0" for 209 hosts
21:05 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl2003.codfw.wmnet
21:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl2003.codfw.wmnet with OS bookworm
21:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
21:00 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye
20:51 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl2003.codfw.wmnet with reason: host reimage
20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
20:48 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl2003.codfw.wmnet with reason: host reimage
20:48 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
20:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
20:44 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
20:40 dancy@deploy2002: Installation of scap version "4.126.0" completed for 1 hosts
20:39 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts
20:32 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl2003.codfw.wmnet with OS bookworm
20:30 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
20:30 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye
20:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002"
20:28 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002"
20:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl2003.codfw.wmnet on all recursors
20:28 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl2003.codfw.wmnet on all recursors
20:28 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:28 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002"
20:26 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2003.codfw.wmnet - herron@cumin1002"
20:13 herron@cumin1002: START - Cookbook sre.dns.netbox
20:13 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl2003.codfw.wmnet
20:10 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts
20:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
20:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye
20:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
19:52 hashar@deploy2002: Finished deploy [integration/docroot@1627206]: build: update mediawiki-codesniffer to 45.0.0 & prevent LibUp from removing a phpcs rule (duration: 00m 10s)
19:52 hashar@deploy2002: Started deploy [integration/docroot@1627206]: build: update mediawiki-codesniffer to 45.0.0 & prevent LibUp from removing a phpcs rule
19:51 dancy@deploy2002: Installing scap version "4.126.0" for 1 hosts
19:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
19:42 dancy@deploy2002: Installing scap version "4.126.0" for 209 hosts
19:35 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl2002.codfw.wmnet
19:35 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-ctrl2002.codfw.wmnet with OS bookworm
19:20 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-ctrl2002.codfw.wmnet with reason: host reimage
19:17 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-ctrl2002.codfw.wmnet with reason: host reimage
19:12 urandom: bootstrapping cassandra, restbase2038-{a,b,c} — T380236
19:08 inflatador: bking@krb1001 add kerberos keytab for blunderbuss https://phabricator.wikimedia.org/P71106 T371994
19:04 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-ctrl2002.codfw.wmnet with OS bookworm
19:03 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002"
19:03 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002"
19:03 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl2002.codfw.wmnet on all recursors
19:03 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl2002.codfw.wmnet on all recursors
19:03 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:03 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002"
19:03 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-ctrl2002.codfw.wmnet - herron@cumin1002"
18:58 herron@cumin1002: START - Cookbook sre.dns.netbox
18:58 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl2002.codfw.wmnet
17:32 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4] (hadoop-test): Regular analytics weekly train BIS TEST [analytics/refinery@295d5a44] (duration: 03m 36s)
17:28 joal@deploy2002: Started deploy [analytics/refinery@295d5a4] (hadoop-test): Regular analytics weekly train BIS TEST [analytics/refinery@295d5a44]
17:28 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:27 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:22 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4] (thin): Regular analytics weekly train BIS THIN [analytics/refinery@295d5a44] (duration: 05m 02s)
17:22 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:21 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:20 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:19 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:18 joal@deploy2002: Started deploy [analytics/refinery@295d5a4] (thin): Regular analytics weekly train BIS THIN [analytics/refinery@295d5a44]
17:17 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:16 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4]: Regular analytics weekly train BIS [analytics/refinery@295d5a44] (duration: 03m 41s)
17:12 joal@deploy2002: Started deploy [analytics/refinery@295d5a4]: Regular analytics weekly train BIS [analytics/refinery@295d5a44]
17:05 sukhe: restart tomcat on idp2004
17:04 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:03 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:02 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:01 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:00 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:00 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
16:43 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
16:43 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
16:43 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:43 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:43 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
16:42 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: apply
16:40 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: apply
16:39 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
16:38 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
16:37 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
16:36 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
16:35 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
16:35 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
16:34 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
16:28 jiji@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
16:26 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
16:25 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
16:24 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
16:23 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
16:22 jiji@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
16:22 jiji@deploy2002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
16:21 jiji@deploy2002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
16:15 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
16:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
15:51 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:50 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:50 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:49 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:48 dancy@deploy2002: Finished scap sync-world: no-op deployment for testing. (duration: 03m 21s)
15:44 dancy@deploy2002: Started scap sync-world: no-op deployment for testing.
15:44 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:44 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:37 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:37 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: host overworked by dumps - T368098
15:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: host overworked by dumps - T368098
15:31 jynus: starting resharding of commons backup files into new host backup2010 T376892
15:27 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:23 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:23 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:22 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:22 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:19 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:19 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:15 apine@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:14 apine@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:13 apine@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:13 apine@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:10 apine@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:09 apine@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:09 urandom: bootstrapping cassandra, restbase2037-{a,b,c} — T380236
15:04 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P{cephosd100[2-4].eqiad.wmnet} and (A:cephosd)
14:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:53 JennH: power cycling unresponsive mgmt switch in codfw: msw-c3-codfw
14:50 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-workers (exit_code=99) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
14:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:29 cdanis: T380226 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕤☕ mwscript sql.php --wiki=commonswiki --cluster=extension1 /srv/mediawiki/php-1.44.0-wmf.4/extensions/JsonConfig/sql/mysql/tables-generated.sql
14:25 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7007.magru.wmnet [reason: host reimaged]
14:24 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P{cephosd100[2-4].eqiad.wmnet} and (A:cephosd)
14:23 jynus: starting resharding of commons backup files into new host backup1010 T376892
14:23 sukhe: running homer on asw*magru*
14:06 jiji@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:05 jiji@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
14:05 jiji@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:05 jiji@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:05 jiji@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
14:04 jiji@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
14:04 jiji@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
14:04 jiji@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
14:04 jiji@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
14:03 jiji@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
14:03 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:03 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:03 jiji@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:03 jiji@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:03 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:02 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:02 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:02 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2136-2139,2141-2155].codfw.wmnet
13:55 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2136-2139,2141-2155].codfw.wmnet
13:53 claime: homer 'lsw1-d4-codfw*' commit 'T377028'
13:52 claime: homer 'lsw1-b4-codfw*' commit 'T377028'
13:52 claime: homer 'lsw1-d2-codfw*' commit 'T377028'
13:51 claime: homer 'lsw1-c2-codfw*' commit 'T377028'
13:50 claime: homer 'lsw1-d7-codfw*' commit 'T377028'
13:50 claime: homer 'lsw1-c4-codfw*' commit 'T377028'
13:49 claime: homer 'lsw1-d5-codfw*' commit 'T377028'
13:48 claime: homer 'lsw1-b7-codfw*' commit 'T377028'
13:47 claime: homer 'lsw1-c7-codfw*' commit 'T377028'
13:46 claime: homer 'lsw1-d6-codfw*' commit 'T377028'
13:45 claime: homer 'lsw1-b2-codfw*' commit 'T377028'
13:44 claime: homer 'lsw1-d1-codfw*' commit 'T377028'
13:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm
13:38 effie: putting kafka-main1006.eqiad.wmnet in production
13:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm
13:36 jiji@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-main-eqiad
13:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm
13:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm
13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:28 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
13:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:26 jiji@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-main-eqiad
13:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm
13:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm
13:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
13:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7007.magru.wmnet with OS bullseye
13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
13:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
13:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
13:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
13:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
13:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
13:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
13:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
13:00 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
12:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
12:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
12:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
12:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
12:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
12:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
12:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm
12:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm
12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm
12:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm
12:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm
12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm
12:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm
12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm
12:38 sukhe: re-enable puppet on cumin2002
12:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
12:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm
12:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
12:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm
12:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm
12:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm
12:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
12:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
12:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage
12:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm
12:20 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
12:19 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet
12:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
12:16 sukhe@cumin2002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet
12:16 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet
12:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage
12:14 sukhe@cumin1002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet
12:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
12:08 sukhe: disable puppet on cumin2002 to test cumin alias for A:installserver
12:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
12:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage
11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
11:58 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
11:57 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
11:57 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage
11:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage
11:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage
11:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm
11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm
11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm
11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm
11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm
11:37 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm
11:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm
11:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
11:24 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
11:22 akosiaris: decommission cxserver endpoints /api/rest_v1/transform/html/from, /api/rest_v1/transform/word/from from RESTBase T375616
10:43 btullis@cumin1002: END (PASS) - Cookbook sre.ceph.roll-restart-reboot-server (exit_code=0) rolling reboot on P{cephosd1001.eqiad.wmnet} and (A:cephosd)
10:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
10:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
10:34 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
10:33 btullis@cumin1002: START - Cookbook sre.ceph.roll-restart-reboot-server rolling reboot on P{cephosd1001.eqiad.wmnet} and (A:cephosd)
10:33 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kafka-main[1001,1006].eqiad.wmnet with reason: Hardware refresh
10:33 jayme: re-enabled puppet on all k8s controll planes for rollout of T380142
10:33 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kafka-main[1001,1006].eqiad.wmnet with reason: Hardware refresh
10:22 effie: removing leadership from kafka-main1001 - T363214
10:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:52 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.4 refs T375663
09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:41 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:38 akosiaris: decommission cxserver endpoints /api/rest_v1/list/(pair|tool|languagepairs) from RESTBase T375616
09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:33 aklapper@deploy2002: Finished scap sync-world: Backport for EditionLookup: Update EntityLookup calls (T380304) (duration: 13m 33s)
09:33 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
09:33 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
09:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:27 aklapper@deploy2002: aklapper, thiemowmde: Continuing with sync
09:26 aklapper@deploy2002: aklapper, thiemowmde: Backport for EditionLookup: Update EntityLookup calls (T380304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of prometheus7001.magru.wmnet to plain
09:20 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of prometheus7001.magru.wmnet to plain
09:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:20 aklapper@deploy2002: Started scap sync-world: Backport for EditionLookup: Update EntityLookup calls (T380304)
09:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
09:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of doh7002.wikimedia.org to plain
09:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of doh7002.wikimedia.org to plain
09:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ncredir7002.magru.wmnet to plain
09:13 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ncredir7002.magru.wmnet to plain
08:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of durum7002.magru.wmnet to plain
08:51 jayme: disabling puppet on all k8s controll planes for rollout of T380142
08:48 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of durum7002.magru.wmnet to plain
08:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of bast7001.wikimedia.org to plain
08:44 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of bast7001.wikimedia.org to plain
08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
08:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
08:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
08:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
08:18 hashar: Restarted CI Jenkins to upgrade Leastload plugin and remove the SSH server plugin

2024-11-19

22:50 ryankemper@deploy2002: Started deploy [wdqs/wdqs@9927a5a] (wcqs): Deploy 0.3.150 to WCQS
22:00 urbanecm@deploy2002: Finished scap sync-world: Backport for Enable experimental Parsoid fragment support on labs and test wikis (T374661), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234) (duration: 20m 39s)
21:53 urbanecm@deploy2002: cscott, kemayo, urbanecm: Continuing with sync
21:45 urbanecm@deploy2002: cscott, kemayo, urbanecm: Backport for Enable experimental Parsoid fragment support on labs and test wikis (T374661), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234) synced to the testservers (https://wikitech.wikimedia.or
21:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
21:39 urbanecm@deploy2002: Started scap sync-world: Backport for Enable experimental Parsoid fragment support on labs and test wikis (T374661), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234), Revert "editcheck: Remove try/catch around transaction squashing" (T333710 T380234)
21:38 urbanecm@deploy2002: Finished scap sync-world: Backport for Promote Vector 2022 as default on 3 wikis (T379765), Separate cache key space for test & production JsonConfig data (T380320) (duration: 14m 38s)
21:31 urbanecm@deploy2002: bvibber, jdlrobson, urbanecm: Continuing with sync
21:29 urbanecm@deploy2002: bvibber, jdlrobson, urbanecm: Backport for Promote Vector 2022 as default on 3 wikis (T379765), Separate cache key space for test & production JsonConfig data (T380320) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:23 urbanecm@deploy2002: Started scap sync-world: Backport for Promote Vector 2022 as default on 3 wikis (T379765), Separate cache key space for test & production JsonConfig data (T380320)
21:16 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2038.codfw.wmnet with reason: Bootstrapping — T380236
21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2038.codfw.wmnet with reason: Bootstrapping — T380236
21:15 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2037.codfw.wmnet with reason: Bootstrapping — T380236
21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2037.codfw.wmnet with reason: Bootstrapping — T380236
21:15 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase2036.codfw.wmnet with reason: Bootstrapping — T380236
21:15 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on restbase2036.codfw.wmnet with reason: Bootstrapping — T380236
20:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
20:50 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
20:40 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
20:40 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bullseye
20:32 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye
20:29 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
20:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
20:24 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
20:10 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
20:10 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
20:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
20:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1183.eqiad.wmnet with OS bullseye
20:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cp7007.magru.wmnet
19:41 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye
19:40 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cp7007.magru.wmnet
19:34 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:17 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@a4d0954]: mjolnir: T379045 Increase maxResultSize (duration: 00m 26s)
19:16 ebernhardson@deploy2002: Started deploy [airflow-dags/search@a4d0954]: mjolnir: T379045 Increase maxResultSize
19:15 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
19:14 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye
19:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage
19:08 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
19:08 sukhe@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp7007.magru.wmnet with OS bullseye
19:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1183.eqiad.wmnet with reason: host reimage
19:05 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
19:05 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
18:53 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye
18:53 brett: Import ncmonitor 1.3.0-1 into main apt repo
18:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1183.eqiad.wmnet with OS bullseye
18:48 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
18:47 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye
18:39 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
18:36 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
18:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
18:34 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
18:34 amastilovic@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
18:34 sukhe@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp7007.magru.wmnet with OS bullseye
18:32 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
18:32 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
18:07 sukhe@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
17:57 brennen@deploy2002: Finished scap sync-world: Backport for Prevent ce_event_wikis query when feature flag is off (T380288) (duration: 15m 10s)
17:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1326.eqiad.wmnet with OS bookworm
17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1327.eqiad.wmnet with OS bookworm
17:53 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:53 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye
17:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1325.eqiad.wmnet with OS bookworm
17:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:50 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1183.eqiad.wmnet with OS bullseye
17:50 brennen@deploy2002: daimona, brennen: Continuing with sync
17:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1323.eqiad.wmnet with OS bookworm
17:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:47 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker1290
17:47 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1290
17:47 brennen@deploy2002: daimona, brennen: Backport for Prevent ce_event_wikis query when feature flag is off (T380288) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:47 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1322.eqiad.wmnet with OS bookworm
17:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on wikikube-worker1290.eqiad.wmnet with reason: being moved to new port
17:42 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on wikikube-worker1290.eqiad.wmnet with reason: being moved to new port
17:42 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
17:41 brennen@deploy2002: Started scap sync-world: Backport for Prevent ce_event_wikis query when feature flag is off (T380288)
17:41 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
17:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1324.eqiad.wmnet with OS bookworm
17:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage
17:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2110.codfw.wmnet with OS bullseye
17:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage
17:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1183.eqiad.wmnet with OS bullseye
17:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage
17:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage
17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1326.eqiad.wmnet with reason: host reimage
17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1327.eqiad.wmnet with reason: host reimage
17:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1325.eqiad.wmnet with reason: host reimage
17:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage
17:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage
17:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2110.codfw.wmnet with reason: host reimage
17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1323.eqiad.wmnet with reason: host reimage
17:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1314.eqiad.wmnet with OS bookworm
17:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1324.eqiad.wmnet with reason: host reimage
17:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1322.eqiad.wmnet with reason: host reimage
17:18 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2110.codfw.wmnet with reason: host reimage
17:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2140.codfw.wmnet with OS bookworm
17:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1318.eqiad.wmnet with OS bookworm
17:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1319.eqiad.wmnet with OS bookworm
17:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1326.eqiad.wmnet with OS bookworm
17:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1327.eqiad.wmnet with OS bookworm
17:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1325.eqiad.wmnet with OS bookworm
17:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1320.eqiad.wmnet with OS bookworm
17:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1321.eqiad.wmnet with OS bookworm
17:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1316.eqiad.wmnet with OS bookworm
17:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1323.eqiad.wmnet with OS bookworm
17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1324.eqiad.wmnet with OS bookworm
17:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1322.eqiad.wmnet with OS bookworm
17:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2110.codfw.wmnet with OS bullseye
17:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2110']
17:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage
17:00 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2110']
16:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1317.eqiad.wmnet with OS bookworm
16:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage
16:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1315.eqiad.wmnet with OS bookworm
16:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage
16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1313.eqiad.wmnet with OS bookworm
16:52 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:52 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage
16:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage
16:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage
16:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage
16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage
16:36 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1320.eqiad.wmnet with reason: host reimage
16:36 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7007.magru.wmnet
16:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1321.eqiad.wmnet with reason: host reimage
16:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1318.eqiad.wmnet with reason: host reimage
16:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1319.eqiad.wmnet with reason: host reimage
16:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage
16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1316.eqiad.wmnet with reason: host reimage
16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1317.eqiad.wmnet with reason: host reimage
16:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1315.eqiad.wmnet with reason: host reimage
16:31 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1314.eqiad.wmnet with reason: host reimage
16:30 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1313.eqiad.wmnet with reason: host reimage
16:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm
16:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2139.codfw.wmnet with OS bookworm
16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1319.eqiad.wmnet with OS bookworm
16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1320.eqiad.wmnet with OS bookworm
16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1321.eqiad.wmnet with OS bookworm
16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1318.eqiad.wmnet with OS bookworm
16:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm
16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1317.eqiad.wmnet with OS bookworm
16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1316.eqiad.wmnet with OS bookworm
16:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1315.eqiad.wmnet with OS bookworm
16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1314.eqiad.wmnet with OS bookworm
16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1313.eqiad.wmnet with OS bookworm
16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm
16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm
16:07 dreamyjazz@deploy2002: Finished scap sync-world: Backport for ExperimentUserDefaultsManager: Decrease log severity to debug (T380271) (duration: 13m 16s)
16:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage
16:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm
16:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage
15:59 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
15:59 dreamyjazz@deploy2002: dreamyjazz: Backport for ExperimentUserDefaultsManager: Decrease log severity to debug (T380271) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage
15:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
15:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2140.codfw.wmnet with OS bookworm
15:53 dreamyjazz@deploy2002: Started scap sync-world: Backport for ExperimentUserDefaultsManager: Decrease log severity to debug (T380271)
15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage
15:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage
15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage
15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage
15:47 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage
15:45 moritzm: installing libheif security updates
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm
15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm
15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm
15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm
15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm
15:25 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm
15:25 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2138.codfw.wmnet with OS bookworm
15:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm
15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2142.codfw.wmnet with OS bookworm
15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2141.codfw.wmnet with OS bookworm
15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2137.codfw.wmnet with OS bookworm
15:21 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm
15:15 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp7007.magru.wmnet with OS bullseye
15:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
15:06 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
15:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
15:05 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
away: UTC afternoon deploys done
14:59 tgr@deploy2002: Finished scap sync-world: Backport for Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811) (duration: 14m 16s)
14:52 tgr@deploy2002: tgr: Continuing with sync
14:50 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
14:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
14:50 tgr@deploy2002: tgr: Backport for Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:49 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
14:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw
14:48 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
14:46 fabfur@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp7007.magru.wmnet with reason: host reimage
14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
14:44 tgr@deploy2002: Started scap sync-world: Backport for Use 'auth' rather than 'sso' as cookie prefix on the auth domain (T379811)
14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm
14:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm
14:43 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm
14:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm
14:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm
14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm
14:39 elukey: limit /v2/_catalog to internal IPs only for all Docker Registry nodes - T378618
14:38 kartik@deploy2002: Finished scap sync-world: Backport for Enable message group subscription feature for MediaWiki.org (T372386) (duration: 16m 21s)
14:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
14:34 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
14:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
14:33 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
14:31 kartik@deploy2002: kartik, abi: Continuing with sync
14:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
14:30 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
14:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw
14:28 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
14:28 kartik@deploy2002: kartik, abi: Backport for Enable message group subscription feature for MediaWiki.org (T372386) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
14:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
14:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
14:24 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
14:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
14:23 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
14:22 kartik@deploy2002: Started scap sync-world: Backport for Enable message group subscription feature for MediaWiki.org (T372386)
14:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
14:21 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
14:21 fabfur@cumin1002: START - Cookbook sre.hosts.reimage for host cp7007.magru.wmnet with OS bullseye
14:21 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
14:18 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
14:17 kartik@deploy2002: Finished scap sync-world: Backport for Enable the Contribute menu in 3rd group of Wikis (T375301) (duration: 15m 07s)
14:15 joal@deploy2002: Finished deploy [analytics/refinery@295d5a4]: Regular analytics weekly train [analytics/refinery@295d5a44] (duration: 08m 56s)
14:11 kartik@deploy2002: kartik: Continuing with sync
14:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1290.eqiad.wmnet
14:10 kartik@deploy2002: kartik: Backport for Enable the Contribute menu in 3rd group of Wikis (T375301) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:10 akosiaris@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1290.eqiad.wmnet
14:07 ihurbain@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
14:06 joal@deploy2002: Started deploy [analytics/refinery@295d5a4]: Regular analytics weekly train [analytics/refinery@295d5a44]
14:06 ihurbain@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
14:05 ihurbain@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
14:04 ihurbain@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
14:03 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
14:02 kartik@deploy2002: Started scap sync-world: Backport for Enable the Contribute menu in 3rd group of Wikis (T375301)
14:02 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
14:01 ihurbain@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
14:01 ihurbain@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
13:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
13:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
13:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266098
13:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 266098
13:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 267521
13:07 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 267521
13:07 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 201838
13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 201838
13:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262979
13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262979
13:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 266631
13:06 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 266631
13:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 53180
13:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 53180
13:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 21574
13:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 21574
12:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
12:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
12:42 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
12:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw
12:40 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
12:38 arnaudb@cumin1002: END (FAIL) - Cookbook sre.switchdc.databases.prepare (exit_code=99) for the switch from eqiad to codfw
12:36 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
12:35 moritzm: removing ganeti1016 from active Ganeti nodes T378921
12:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
12:27 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
12:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
12:22 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
12:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
12:18 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
11:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
11:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 100%: repool', diff saved to https://phabricator.wikimedia.org/P71095 and previous config saved to /var/cache/conftool/dbconfig/20241119-114422-arnaudb.json
11:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
11:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
11:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 75%: repool', diff saved to https://phabricator.wikimedia.org/P71094 and previous config saved to /var/cache/conftool/dbconfig/20241119-112917-arnaudb.json
11:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 50%: repool', diff saved to https://phabricator.wikimedia.org/P71093 and previous config saved to /var/cache/conftool/dbconfig/20241119-111411-arnaudb.json
11:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp2004.codfw.wmnet
11:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 207947
11:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 207947
10:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 25%: repool', diff saved to https://phabricator.wikimedia.org/P71092 and previous config saved to /var/cache/conftool/dbconfig/20241119-105906-arnaudb.json
10:58 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp2004.codfw.wmnet
10:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 15%: repool', diff saved to https://phabricator.wikimedia.org/P71091 and previous config saved to /var/cache/conftool/dbconfig/20241119-104401-arnaudb.json
10:41 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
10:37 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
10:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 10%: repool', diff saved to https://phabricator.wikimedia.org/P71090 and previous config saved to /var/cache/conftool/dbconfig/20241119-102855-arnaudb.json
10:27 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
10:25 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
10:16 moritzm: restart spamd on vrts to pick up openssl updates
10:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 5%: repool', diff saved to https://phabricator.wikimedia.org/P71089 and previous config saved to /var/cache/conftool/dbconfig/20241119-101350-arnaudb.json
10:02 moritzm: installing openssl security updates
10:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
10:00 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
09:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
09:59 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
09:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
09:58 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
09:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw
09:52 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
09:51 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:51 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
09:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from eqiad to codfw
09:49 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from eqiad to codfw
09:42 fabfur: upgrade haproxy on cp-text|upload_eqsin (T379891)
09:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
09:41 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
09:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
09:39 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:39 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
09:39 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
09:39 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
09:38 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:35 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
09:33 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:32 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
09:19 aklapper@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.4 refs T375663
09:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
09:18 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
08:59 urbanecm@deploy2002: Finished scap sync-world: Backport for Add + to nowiki in core-Permissions.php (T380252) (duration: 10m 17s)
08:54 urbanecm@deploy2002: urbanecm, jhsoby: Continuing with sync
08:54 urbanecm@deploy2002: urbanecm, jhsoby: Backport for Add + to nowiki in core-Permissions.php (T380252) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:49 urbanecm@deploy2002: Started scap sync-world: Backport for Add + to nowiki in core-Permissions.php (T380252)
08:48 urbanecm@deploy2002: Finished scap sync-world: Backport for fix tours by finishing partial variable rename (T380071), affcom contactpages: Fix Letter of intent and logo field labels (T375392), Add nowiki to commonsuploads dblist (T380252) (duration: 14m 29s)
08:43 urbanecm@deploy2002: ammarpad, migr, jhsoby, urbanecm: Continuing with sync
08:39 urbanecm@deploy2002: ammarpad, migr, jhsoby, urbanecm: Backport for fix tours by finishing partial variable rename (T380071), affcom contactpages: Fix Letter of intent and logo field labels (T375392), Add nowiki to commonsuploads dblist (T380252) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:34 urbanecm@deploy2002: Started scap sync-world: Backport for fix tours by finishing partial variable rename (T380071), affcom contactpages: Fix Letter of intent and logo field labels (T375392), Add nowiki to commonsuploads dblist (T380252)
08:29 urbanecm@deploy2002: Finished scap sync-world: Backport for Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460), CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150), [GrowthExperiments] Add virtual domain config (T354939) (duration: 24m 42s)
08:22 urbanecm@deploy2002: urbanecm, wangombe, pfischer: Continuing with sync
08:12 urbanecm@deploy2002: urbanecm, wangombe, pfischer: Backport for Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460), CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150), [GrowthExperiments] Add virtual domain config (T354939) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:04 urbanecm@deploy2002: Started scap sync-world: Backport for Translate Event Logging: Enable using $wgTranslateEnableEventLogging (T364460), CirrusSearch: enable offloading weighted tags via EventBus (T378983 T377150), [GrowthExperiments] Add virtual domain config (T354939)
07:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: sad
07:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: sad
07:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215 - hw maintenance
07:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: T374215 - hw maintenance
07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
07:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
07:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.1 (duration: 01m 18s)
04:52 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.4 refs T375663 (duration: 49m 01s)
04:16 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1062.eqiad.wmnet with OS bookworm
04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.4 refs T375663
04:00 ejegg: fundraising civicrm upgraded from 463a12c5 to e29243f0
03:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
03:48 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1062.eqiad.wmnet with reason: host reimage
03:33 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1062.eqiad.wmnet with OS bookworm
03:09 ejegg: payments-wiki upgraded from 459f259b to c4463536
02:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
02:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
02:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
02:23 ejegg: standalone (IPN listener) SmashPig upgraded from 601405dc to 131e92a5
02:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1018.eqiad.wmnet with reason: host reimage
02:08 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1018.eqiad.wmnet with reason: host reimage
01:54 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
01:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
01:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
01:51 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
01:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:24 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1017.eqiad.wmnet with reason: host reimage
01:21 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1017.eqiad.wmnet with reason: host reimage
01:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2006.codfw.wmnet with OS bookworm
01:12 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
01:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
01:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
01:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-jumbo1016.eqiad.wmnet with reason: host reimage
00:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-jumbo1016.eqiad.wmnet with reason: host reimage
00:54 tzatziki: removing 1 file for legal compliance
00:53 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bookworm
00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2005.codfw.wmnet with OS bookworm
00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
00:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
00:41 tzatziki: removing 1 file for legal compliance
00:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
00:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2006.codfw.wmnet with reason: host reimage
00:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:18 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
00:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
00:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2006.codfw.wmnet with OS bookworm
00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2004.codfw.wmnet with OS bookworm
00:14 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:10 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2005.codfw.wmnet with reason: host reimage
00:03 tzatziki: removing 1 file for legal compliance
00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2003.codfw.wmnet with OS bookworm
00:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"

2024-11-18

23:51 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
23:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2004.codfw.wmnet with reason: host reimage
23:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2005.codfw.wmnet with OS bookworm
23:32 tzatziki: removing 1 file for legal compliance
23:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm
23:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:26 tzatziki: removing 1 file for legal compliance
23:26 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2003.codfw.wmnet with reason: host reimage
23:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2004.codfw.wmnet with OS bookworm
23:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
23:15 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
23:12 tzatziki: removing 2 files for legal compliance
23:09 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:09 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
23:09 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
23:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
23:06 eevans@cumin1002: START - Cookbook sre.dns.netbox
23:05 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
23:04 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:04 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
23:04 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2003.codfw.wmnet with OS bookworm
23:04 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Additional IPs for Cassandra — restbase2036 - eevans@cumin1002"
23:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
23:01 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm
23:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1018.eqiad.wmnet with OS bullseye
23:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1017.eqiad.wmnet with OS bullseye
23:00 eevans@cumin1002: START - Cookbook sre.dns.netbox
22:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-jumbo1016.eqiad.wmnet with OS bullseye
22:57 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
22:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2045.codfw.wmnet with OS bookworm
22:55 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm
22:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2044.codfw.wmnet with OS bookworm
22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2046.codfw.wmnet with OS bookworm
22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2043.codfw.wmnet with OS bookworm
22:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2041.codfw.wmnet with OS bookworm
22:52 tzatziki: removing 10 files for legal compliance
22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm
22:50 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
22:49 bking@deploy2002: Finished deploy [wdqs/wdqs@9927a5a]: 0.3.150 (duration: 11m 59s)
22:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
22:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2042.codfw.wmnet with OS bookworm
22:37 bking@deploy2002: Started deploy [wdqs/wdqs@9927a5a]: 0.3.150
22:22 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-be2005.codfw.wmnet with OS bookworm
22:18 urbanecm@deploy2002: Finished scap sync-world: Backport for [GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204) (duration: 09m 14s)
22:13 urbanecm@deploy2002: urbanecm: Continuing with sync
22:13 urbanecm@deploy2002: urbanecm: Backport for [GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:09 urbanecm@deploy2002: Started scap sync-world: Backport for [GrowthExperiments] testwiki: Only enable Add Link for new accounts (T380204)
21:58 urbanecm@deploy2002: Finished scap sync-world: Backport for Use WAN cache for JsonConfig remote fetch cache (T374746), Create no-link-recommendation variant (T377787 T380204), [GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204) (duration: 12m 10s)
21:54 urbanecm@deploy2002: urbanecm, bvibber: Continuing with sync
21:52 urbanecm@deploy2002: urbanecm, bvibber: Backport for Use WAN cache for JsonConfig remote fetch cache (T374746), Create no-link-recommendation variant (T377787 T380204), [GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:48 effie: upload prometheus-mcrouter-exporter_0.4.0+git20241118-1~wmf1 - T380212
21:46 urbanecm@deploy2002: Started scap sync-world: Backport for Use WAN cache for JsonConfig remote fetch cache (T374746), Create no-link-recommendation variant (T377787 T380204), [GrowthExperiments] testwiki: Enable no-link-recommendation experiment (T380204)
21:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
21:36 urbanecm@deploy2002: Finished scap sync-world: Backport for Rename everything referring to "SSO domain" to use "shared domain" (T379811), Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811), Use DB name rather than server name in shared domain path prefix (T379811) (duration: 10m 54s)
21:31 urbanecm@deploy2002: matmarex, urbanecm: Continuing with sync
21:30 urbanecm@deploy2002: matmarex, urbanecm: Backport for Rename everything referring to "SSO domain" to use "shared domain" (T379811), Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811), Use DB name rather than server name in shared domain path prefix (T379811) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:29 urbanecm: Add bvibber to wmf-deployment Gerrit group (existing deployer)
21:26 urbanecm@deploy2002: Started scap sync-world: Backport for Rename everything referring to "SSO domain" to use "shared domain" (T379811), Rename shared domain sso.wikimedia.org to auth.wikimedia.org (T379811), Use DB name rather than server name in shared domain path prefix (T379811)
21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
21:18 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2046.codfw.wmnet with OS bookworm
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2045.codfw.wmnet with OS bookworm
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2044.codfw.wmnet with OS bookworm
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2043.codfw.wmnet with OS bookworm
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2042.codfw.wmnet with OS bookworm
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host es2041.codfw.wmnet with OS bookworm
21:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm
21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2042']
21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2042']
21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['es2041']
21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2041']
21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2041.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:03 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
21:01 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bookworm
21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:52 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bookworm
20:51 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
20:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
20:49 jhathaway: disabling auto-reboot on re-imaging for debugging
20:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm
20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2044.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2043.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2042.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host es2041.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:39 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2041 to codfw - jhancock@cumin2002"
20:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2041 to codfw - jhancock@cumin2002"
20:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox
20:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2037.codfw.wmnet with OS bullseye
20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2112.codfw.wmnet with OS bullseye
20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2113.codfw.wmnet with OS bullseye
20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2037.codfw.wmnet with reason: host reimage
19:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2037.codfw.wmnet with reason: host reimage
19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2112.codfw.wmnet with reason: host reimage
19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2163.codfw.wmnet with OS bookworm
19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:55 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@594d3b5]: T377153 Release glent 0.3.5 (duration: 00m 27s)
19:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2113.codfw.wmnet with reason: host reimage
19:54 ebernhardson@deploy2002: Started deploy [airflow-dags/search@594d3b5]: T377153 Release glent 0.3.5
19:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2112.codfw.wmnet with reason: host reimage
19:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2113.codfw.wmnet with reason: host reimage
19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage
19:36 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2112.codfw.wmnet with OS bullseye
19:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2113.codfw.wmnet with OS bullseye
19:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2037.codfw.wmnet with OS bullseye
19:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2163.codfw.wmnet with reason: host reimage
19:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2113']
19:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2037']
19:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2113']
19:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2037']
19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
19:17 swfrench@deploy2002: Finished scap sync-world: Test deployment after adding mwdebug-next check command - T372604 (duration: 01m 31s)
19:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
19:15 swfrench@deploy2002: Started scap sync-world: Test deployment after adding mwdebug-next check command - T372604
19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
18:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
18:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
18:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
18:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
18:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
18:41 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
18:27 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
18:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
18:15 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
18:15 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
18:14 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
18:13 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
18:12 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-be2005.codfw.wmnet with OS bullseye
18:09 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
18:08 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
18:04 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
18:03 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
18:03 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
18:01 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
17:53 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
17:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
17:28 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@16a5867]: Deploy latest DAGs to analytics Airflow instance. T368755. (duration: 02m 10s)
17:25 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@16a5867]: Deploy latest DAGs to analytics Airflow instance. T368755.
17:24 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set DNS for new maps-test nodes - pt1979@cumin2002"
16:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set DNS for new maps-test nodes - pt1979@cumin2002"
16:50 volans: installing spicerack v8.16.2 on cumin1002
16:50 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:38 volans: installing spicerack v8.16.2 on cumin2002
16:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1305-1312].eqiad.wmnet
16:34 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1305-1312].eqiad.wmnet
16:34 volans: uploaded spicerack_8.16.2 to apt.wikimedia.org bullseye-wikimedia
16:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bookworm
16:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bookworm
16:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bookworm
16:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bookworm
16:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bookworm
16:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm
16:13 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1005.eqiad.wmnet
16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bookworm
16:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bookworm
16:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
16:06 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1005.eqiad.wmnet
16:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
15:58 Lucas_WMDE: UTC afternoon backport+config window done
15:58 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Unified dashboard: Add UI for page collection recommendations (T368718) (duration: 27m 17s)
15:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
15:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
15:55 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
15:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
15:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
15:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
15:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
15:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
15:49 lucaswerkmeister-wmde@deploy2002: sbisson, lucaswerkmeister-wmde: Continuing with sync
15:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
15:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
15:45 lucaswerkmeister-wmde@deploy2002: sbisson, lucaswerkmeister-wmde: Backport for Unified dashboard: Add UI for page collection recommendations (T368718) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bookworm
15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bookworm
15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bookworm
15:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm
15:31 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Unified dashboard: Add UI for page collection recommendations (T368718)
15:30 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bookworm
15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bookworm
15:27 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bookworm
15:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bookworm
15:11 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983) (duration: 08m 14s)
15:07 lucaswerkmeister-wmde@deploy2002: samtar, lucaswerkmeister-wmde: Continuing with sync
15:06 lucaswerkmeister-wmde@deploy2002: samtar, lucaswerkmeister-wmde: Backport for Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "Allow other input and changes to trigger searchsuggestions to update" (T379983)
15:00 arnaudb@cumin1002: dbctl commit (dc=all): 'manual depool commit', diff saved to https://phabricator.wikimedia.org/P71077 and previous config saved to /var/cache/conftool/dbconfig/20241118-150020-arnaudb.json
14:59 arnaudb@cumin1002: dbctl commit (dc=all): 'manual repool commit', diff saved to https://phabricator.wikimedia.org/P71076 and previous config saved to /var/cache/conftool/dbconfig/20241118-145946-arnaudb.json
14:56 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2216 slowly with 10 steps - slow motion repool T380131
14:56 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2216 slowly with 10 steps - slow motion repool T380131
14:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2150 slowly with 10 steps - slow repool db2150 T380117
14:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[1305-1312].eqiad.wmnet
14:28 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[1305-1312].eqiad.wmnet
14:16 claime: running homer 'cr*-eqiad' 'T379454'
14:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mc-gp1004.eqiad.wmnet
14:09 btullis@cumin1002: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
14:04 jiji@cumin1002: START - Cookbook sre.hosts.reboot-single for host mc-gp1004.eqiad.wmnet
13:50 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
13:49 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
13:49 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
13:48 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
13:47 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
13:46 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
13:37 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
13:37 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
13:35 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
13:35 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
13:35 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
13:34 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
13:34 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
13:33 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
13:31 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
13:31 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
13:31 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
13:30 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
13:28 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
13:28 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
13:27 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
13:26 topranks: stopping netbox service on netbox-next test server to restore new database backup from production
13:25 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
13:25 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
13:20 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1018.eqiad.wmnet with OS bullseye
13:16 urbanecm: mwmaint2002: Run `extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php` at `testwiki` for a bunch of pages (P71064 is list of commands executed; T378983)
13:04 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
13:03 jelto@deploy2002: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
13:01 moritzm: removing ganeti1021 from active Ganeti nodes T378921
12:56 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage
12:54 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1018.eqiad.wmnet with reason: host reimage
12:39 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye
12:38 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1018.eqiad.wmnet with OS bullseye
12:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:37 kart_: Updated recommendation api to 2024-11-13-183159-production (T379592, T379037)
12:36 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2150 slowly with 10 steps - slow repool db2150 T380117
12:36 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
12:24 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:22 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye
12:22 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:21 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1018.eqiad.wmnet with OS bullseye
12:19 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
12:15 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
12:14 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
12:13 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-ulsfo
12:13 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:10 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
12:09 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
12:08 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-presto1018.eqiad.wmnet with OS bullseye
12:02 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
12:00 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:59 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:59 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
11:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
11:45 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:41 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
11:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2216.codfw.wmnet with reason: T380131 - table corruption
11:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2216.codfw.wmnet with reason: T380131 - table corruption
11:41 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
11:41 urbanecm: mwmaint2002: Run `extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php` at `testwiki` for a bunch of pages (P71064 is list of commands executed; T378983)
11:33 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
11:25 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:21 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:16 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:49 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:46 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
10:46 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
10:45 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:43 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:41 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
10:41 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
10:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:27 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:14 fabfur: upgrade haproxy on cp-ulsfo (T379891)
10:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:14 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-ulsfo
10:13 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:47 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
09:47 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
09:42 moritzm: restarting nginx on acmechief hosts to pick up openssl updates
09:24 moritzm: installing openssl security updates
09:18 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:17 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:57 kartik@deploy2002: Finished scap sync-world: Backport for Enable the Contribute menu in 2nd group of Wikis (T375300) (duration: 11m 45s)
08:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 40850
08:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 40850
08:53 kartik@deploy2002: kartik: Continuing with sync
08:49 kartik@deploy2002: kartik: Backport for Enable the Contribute menu in 2nd group of Wikis (T375300) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:45 kartik@deploy2002: Started scap sync-world: Backport for Enable the Contribute menu in 2nd group of Wikis (T375300)
08:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on registry1004.eqiad.wmnet with reason: testing
08:44 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on registry1004.eqiad.wmnet with reason: testing
08:43 kartik@deploy2002: Finished scap sync-world: Backport for bjnwikiquote: Add local logo (T375054) (duration: 22m 55s)
08:31 kartik@deploy2002: kartik, hamishz: Continuing with sync
08:30 kartik@deploy2002: kartik, hamishz: Backport for bjnwikiquote: Add local logo (T375054) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:20 kartik@deploy2002: Started scap sync-world: Backport for bjnwikiquote: Add local logo (T375054)
08:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
08:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
08:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
08:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
08:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
08:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
07:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
07:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
07:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
07:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
07:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
07:46 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
07:46 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
06:31 kart_: Updated MinT to 2024-10-16-065051-production on eqiad
06:28 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
06:19 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply

2024-11-17

16:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2216.codfw.wmnet with reason: Sad
16:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2216.codfw.wmnet with reason: Sad
16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2216 sad', diff saved to https://phabricator.wikimedia.org/P71059 and previous config saved to /var/cache/conftool/dbconfig/20241117-163522-ladsgroup.json

2024-11-16

20:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
18:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
18:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
18:06 jclark@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1183.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
18:05 jclark@cumin1002: START - Cookbook sre.dns.netbox
18:01 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:59 jclark@cumin1002: START - Cookbook sre.dns.netbox
17:59 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:56 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
17:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
17:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:53 jclark@cumin1002: START - Cookbook sre.hosts.provision for host kafka-jumbo1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1313.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:52 jclark@cumin1002: START - Cookbook sre.dns.netbox
17:50 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
17:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
17:45 jclark@cumin1002: START - Cookbook sre.dns.netbox
17:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1323.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
17:09 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
17:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1313.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:05 jclark@cumin1002: START - Cookbook sre.dns.netbox
17:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1326.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1321.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1324.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1322.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1320.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1325.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1319.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1316.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1318.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1315.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1317.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1314.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1326.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1327.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1323.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1324.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1322.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1321.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1320.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1325.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1318.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1317.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:32 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1316.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1315.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1314.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:31 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1319.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:30 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:30 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
16:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
16:27 jclark@cumin1002: START - Cookbook sre.dns.netbox
00:44 tzatziki: removing 103 files for legal compliance

2024-11-15

23:42 tzatziki: removing 1 file for legal compliance
23:19 tzatziki: removing 3 files for legal compliance
22:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic2112.codfw.wmnet with OS bullseye
21:59 Dreamy_Jazz: Started MediaModeration scan on all wikis other than commonswiki attempting to scan all failed to be scanned images - https://wikitech.wikimedia.org/wiki/MediaModeration
21:59 Dreamy_Jazz: Started MediaModeration scan on commons wiki attempting to scan all failed to be scanned images - https://wikitech.wikimedia.org/wiki/MediaModeration
21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2115.codfw.wmnet with OS bullseye
21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2114.codfw.wmnet with OS bullseye
21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic2111.codfw.wmnet with OS bullseye
21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2115.codfw.wmnet with reason: host reimage
21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2038.codfw.wmnet with OS bullseye
21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2114.codfw.wmnet with reason: host reimage
21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase2036.codfw.wmnet with OS bullseye
21:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic2111.codfw.wmnet with reason: host reimage
21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2115.codfw.wmnet with reason: host reimage
21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2114.codfw.wmnet with reason: host reimage
21:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic2111.codfw.wmnet with reason: host reimage
21:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2115.codfw.wmnet with OS bullseye
21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2114.codfw.wmnet with OS bullseye
21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2112.codfw.wmnet with OS bullseye
21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic2111.codfw.wmnet with OS bullseye
21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2038.codfw.wmnet with reason: host reimage
21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2115']
21:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2115']
21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2114']
21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2114']
21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2112']
21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2112']
21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic2111']
21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2111']
21:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic2110']
21:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase2036.codfw.wmnet with reason: host reimage
21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2114.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2111.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2038.codfw.wmnet with reason: host reimage
21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2115.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic2112.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase2036.codfw.wmnet with reason: host reimage
21:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2115.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2114.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2113.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2112.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2111.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host elastic2110.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding elastic2110 to codfw - jhancock@cumin2002"
20:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding elastic2110 to codfw - jhancock@cumin2002"
20:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
20:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2038.codfw.wmnet with OS bullseye
20:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host restbase2036.codfw.wmnet with OS bullseye
20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2036']
20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['restbase2038']
20:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2038']
20:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['restbase2036']
20:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2038.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase2036.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:41 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host restbase2037
20:40 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host restbase2037
20:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2038.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2037.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host restbase2036.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding restbase2036 to codfw - jhancock@cumin2002"
20:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding restbase2036 to codfw - jhancock@cumin2002"
20:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:54 dancy@deploy2002: Finished scap sync-world: Testing T377883 (duration: 03m 06s)
19:51 dancy@deploy2002: Started scap sync-world: Testing T377883
19:50 dancy@deploy2002: Installation of scap version "4.124.0" completed for 206 hosts
19:46 dancy@deploy2002: Installing scap version "4.124.0" for 206 hosts
18:53 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
18:52 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
18:35 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
18:34 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
18:32 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
18:31 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
18:15 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:15 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:09 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:08 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:58 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@82083c4]: image suggestions hotfix - section titles denylist dependency (duration: 01m 58s)
16:57 taavi: copy python3-flask-{keystone,oslolog} from bullseye-wikimedia to bookworm-wikimedia
16:56 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@82083c4]: image suggestions hotfix - section titles denylist dependency
16:27 herron@cumin2002: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1005.eqiad.wmnet,cluster=aux-k8s,service=kubesvc
16:27 herron@cumin2002: conftool action : set/weight=10; selector: name=aux-k8s-worker1005.eqiad.wmnet,cluster=aux-k8s,service=kubesvc
16:22 herron@cumin2002: conftool action : set/pooled=yes; selector: name=aux-k8s-worker1004.eqiad.wmnet,cluster=aux-k8s,service=kubesvc
16:22 herron@cumin2002: conftool action : set/weight=10; selector: name=aux-k8s-worker1004.eqiad.wmnet,cluster=aux-k8s,service=kubesvc
16:09 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet [reason: ATS fixed]
16:08 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp4043.ulsfo.wmnet
16:08 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp4043.ulsfo.wmnet
16:06 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4051*} and A:cp for 9.2.6-1wm2
16:03 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4051*} and A:cp for 9.2.6-1wm2
16:00 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.6-1wm2_amd64.changes: T379797
15:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db2230.codfw.wmnet,db1125.eqiad.wmnet with reason: testing stuff on test-s4
15:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db2230.codfw.wmnet,db1125.eqiad.wmnet with reason: testing stuff on test-s4
15:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from eqiad to codfw
15:41 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from eqiad to codfw
15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad
15:39 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad
15:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad
15:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
15:38 arnaudb@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad
15:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
15:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove e8 lo0 IP - ayounsi@cumin1002"
13:59 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove e8 lo0 IP - ayounsi@cumin1002"
13:55 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
13:55 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:52 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
13:41 XioNoX: test no-passwords on mr1-eqsin - T379464
13:31 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts sretest1004.eqiad.wmnet
13:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:31 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
13:31 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
13:27 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
13:24 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002
13:23 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts sretest1004.eqiad.wmnet
13:21 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002
13:19 cmooney@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002
13:17 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update homer wmf-plugin to export Netbox ipsec data - cmooney@cumin1002
13:01 moritzm: imported 8u432-b06-2~deb12u1 to component/jdk8 for bookworm (forward port of the latest Java 8 security fixes for Bookworm)
12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host build2002.codfw.wmnet
12:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host build2002.codfw.wmnet with OS bookworm
12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on build2002.codfw.wmnet with reason: host reimage
12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on build2002.codfw.wmnet with reason: host reimage
12:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics: apply
12:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics: apply
12:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics: apply
12:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host build2002.codfw.wmnet with OS bookworm
12:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:16 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM build2002.codfw.wmnet - jmm@cumin2002"
12:15 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM build2002.codfw.wmnet - jmm@cumin2002"
12:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) build2002.codfw.wmnet on all recursors
12:15 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache build2002.codfw.wmnet on all recursors
12:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM build2002.codfw.wmnet - jmm@cumin2002"
12:11 cmooney@cumin1002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox
12:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM build2002.codfw.wmnet - jmm@cumin2002"
12:08 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Update
12:03 jmm@cumin2002: START - Cookbook sre.dns.netbox
12:03 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host build2002.codfw.wmnet
12:01 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
12:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
12:01 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
12:00 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
11:58 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
11:38 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@2c533d6]: hotfix image suggestions weekly snapshots (duration: 00m 57s)
11:37 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@2c533d6]: hotfix image suggestions weekly snapshots
11:27 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
11:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[1305-1312].eqiad.wmnet
11:24 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[1305-1312].eqiad.wmnet
11:22 claime: homer 'lsw1-f5-eqiad*' commit 'T377022'
11:22 claime: homer 'lsw1-f6-eqiad*' commit 'T377022'
11:22 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
11:21 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
11:21 claime: homer 'lsw1-f7-eqiad*' commit 'T377022'
11:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
11:20 claime: homer 'lsw1-e7-eqiad*' commit 'T377022'
11:20 claime: homer 'lsw1-e6-eqiad*' commit 'T377022'
11:19 claime: homer 'lsw1-e5-eqiad*' commit 'T377022'
11:15 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:12 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:12 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:06 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:05 claime: homer 'cr*eqiad*' commit 'T377022'
10:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:34 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
09:34 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
09:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:28 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:28 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:23 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:23 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:22 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:15 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Update
08:48 moritzm: installing Linux 6.1.115 kernel updates from Bookworm point release
04:54 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 12:00:00 on db1246.eqiad.wmnet with reason: depooled
04:54 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 3 days, 12:00:00 on db1246.eqiad.wmnet with reason: depooled
04:51 rzl@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on db1246.eqiad.wmnet with reason: depooled
04:50 rzl@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on db1246.eqiad.wmnet with reason: depooled
04:47 rzl@cumin2002: dbctl commit (dc=all): 'db1246 depooled', diff saved to https://phabricator.wikimedia.org/P71052 and previous config saved to /var/cache/conftool/dbconfig/20241115-044705-rzl.json
03:44 ejegg: fundraising python tools upgraded from c6e2dbcc to b230f718

2024-11-14

23:17 eileen: civicrm upgraded from 2a53f697 to d49a064d
22:59 eileen: civicrm upgraded from 2ab8334a to 2a53f697
22:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp4043.ulsfo.wmnet with reason: ATS upgrade 9.2.6
22:37 brett@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp4043.ulsfo.wmnet with reason: ATS upgrade 9.2.6
22:30 ryankemper: T376150 Depooled `wdqs20[18-20]` in preparation of merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1088185
21:49 aqu@deploy2002: Finished deploy [airflow-dags/analytics@7a66849]: Stage Refine: fix Airflow skip (duration: 00m 59s)
21:48 aqu@deploy2002: Started deploy [airflow-dags/analytics@7a66849]: Stage Refine: fix Airflow skip
21:47 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@7a66849]: Stage Refine: fix Airflow skip (duration: 00m 14s)
21:47 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@7a66849]: Stage Refine: fix Airflow skip
21:26 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2220747]: Stage Refine test fix (duration: 00m 16s)
21:26 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2220747]: Stage Refine test fix
21:20 cjming: end of UTC late backport window
21:17 cjming@deploy2002: Finished scap sync-world: Backport for Redirect to wikis using subpages rather than namespaces too (T376923) (duration: 13m 44s)
21:13 cjming@deploy2002: cjming, pppery: Continuing with sync
21:08 cjming@deploy2002: cjming, pppery: Backport for Redirect to wikis using subpages rather than namespaces too (T376923) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:04 cjming@deploy2002: Started scap sync-world: Backport for Redirect to wikis using subpages rather than namespaces too (T376923)
20:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2139.codfw.wmnet with OS bookworm
20:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:38 bvibber@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
20:37 bvibber@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
20:37 bvibber@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
20:36 bvibber@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
20:35 bvibber@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
20:35 bvibber@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
20:29 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
20:28 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter
20:24 bvibber@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
20:24 bvibber@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
20:24 bvibber@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
20:24 bvibber@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
20:23 bvibber@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
20:23 bvibber@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
20:23 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Network maintenance complete - None
20:01 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Network maintenance complete - None
19:55 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.3 refs T375662
19:40 eileen: tools upgraded from 68f64e43 to c6e2dbcc
19:37 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: junos upgrade done, T364092]
19:37 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: junos upgrade done, T364092]
19:20 James_F: Running `mwscript-k8s -f -- extensions/WikiLambda/maintenance/updateSecondaryTables.php --wiki=wikifunctionswiki --zType Z8 --report --verbose` for T375972, T367005, T373038, T358737
19:19 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox
19:14 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
19:14 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter
19:14 swfrench-wmf: running sre.discovery.datacenter status all to test deployed fix
19:00 brennen: 1.44.0-wmf.3 train status (T375662): no current blockers, but holding for network maintenance.
18:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bullseye
18:19 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
18:18 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter
18:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bullseye
18:13 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging
18:13 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging
18:11 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bullseye
18:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bullseye
18:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1190 gradually with 4 steps - Maint over
18:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bullseye
18:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
17:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bullseye
17:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
17:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage
17:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bullseye
17:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
17:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
17:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2139.codfw.wmnet with reason: host reimage
17:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
17:43 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
17:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
17:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
17:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
17:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
17:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
17:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
17:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm
17:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bullseye
17:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bullseye
17:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bullseye
17:24 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
17:24 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
17:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bullseye
17:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bullseye
17:19 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1190 gradually with 4 steps - Maint over
17:18 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: Network maintenance - None
17:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bullseye
17:15 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=4043.ulsfo.wmnet
17:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:13 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:13 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bullseye
16:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bullseye
16:57 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: Network maintenance - None
16:52 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@7c4873e]: decouple article-level image suggestions from section-level ones (duration: 00m 53s)
16:51 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@7c4873e]: decouple article-level image suggestions from section-level ones
16:45 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
16:45 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter status all services in all: None - None
16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
16:38 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
16:37 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter
16:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
16:36 swfrench@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
16:36 swfrench@cumin2002: START - Cookbook sre.discovery.datacenter
16:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad
16:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1190.eqiad.wmnet with reason: Sad
16:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1190 sad', diff saved to https://phabricator.wikimedia.org/P71044 and previous config saved to /var/cache/conftool/dbconfig/20241114-163317-ladsgroup.json
16:31 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
16:31 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
16:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bullseye
16:04 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 151575
16:03 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 151575
16:01 papaul: ongoing maintenance on cr1-eqiad
16:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade
15:57 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,re0.cr1-eqiad.mgmt with reason: router upgrade
15:56 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging
15:56 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cp4043.ulsfo.wmnet with reason: depooled, debugging
15:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade
15:55 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cr1-eqiad,cr1-eqiad IPV6,cr1-eqiad.mgmt with reason: router upgrade
15:49 moritzm: installing nss security updates
15:48 reedy@deploy2002: Synchronized wmf-config/CommonSettings.php: T379834 (duration: 08m 02s)
15:47 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet
15:47 sukhe@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=97) Rolling upgrade/restart of Apache Traffic Server on P{cp4043*,cp4051*} and A:cp for 9.2.6-1wm1
15:45 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2002.codfw.wmnet
15:45 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2002.codfw.wmnet
15:45 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-ctrl2002.codfw.wmnet
15:45 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-ctrl2002.codfw.wmnet
15:43 pt1979@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
15:43 pt1979@cumin2002: START - Cookbook sre.network.cf
15:42 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4043*,cp4051*} and A:cp for 9.2.6-1wm1
15:40 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1016.eqiad.wmnet with OS bullseye
15:39 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-presto1020.eqiad.wmnet with OS bullseye
15:37 volans: installed spicerack v8.16.1 to cumin hosts
15:36 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: junos upgrade, T364092]
15:36 sukhe@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: junos upgrade, T364092]
15:35 ladsgroup@deploy2002: Finished scap sync-world: Backport for Revert "mmv.js: Store comingFromHashChange as a class property" (T379835) (duration: 12m 10s)
15:33 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.6-1wm1_amd64.changes: T379797
15:30 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox
15:29 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: T379719
15:29 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: T379719
15:28 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-ctrl2002.codfw.wmnet
15:28 jayme@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-ctrl2002.codfw.wmnet
15:27 ladsgroup@deploy2002: ladsgroup: Continuing with sync
15:27 ladsgroup@deploy2002: ladsgroup: Backport for Revert "mmv.js: Store comingFromHashChange as a class property" (T379835) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:24 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:24 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and not A:magru and A:dnsbox
15:23 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "mmv.js: Store comingFromHashChange as a class property" (T379835)
15:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
15:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
15:07 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:07 sergi0: UTC afternoon deploys done
15:06 sgimeno@deploy2002: Finished scap sync-world: Backport for HomepageHooks: run metrics increment in deferred update (T379682) (duration: 11m 15s)
15:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:02 sgimeno@deploy2002: sgimeno: Continuing with sync
14:59 sgimeno@deploy2002: sgimeno: Backport for HomepageHooks: run metrics increment in deferred update (T379682) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:55 sgimeno@deploy2002: Started scap sync-world: Backport for HomepageHooks: run metrics increment in deferred update (T379682)
14:53 volans: uploaded spicerack_8.16.1 to apt.wikimedia.org bullseye-wikimedia
14:50 sgimeno@deploy2002: Finished scap sync-world: Backport for GrowthExperiments: set experiment config only in pilot wikis (T379681) (duration: 13m 02s)
14:45 sgimeno@deploy2002: sgimeno: Continuing with sync
14:41 sgimeno@deploy2002: sgimeno: Backport for GrowthExperiments: set experiment config only in pilot wikis (T379681) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:37 sgimeno@deploy2002: Started scap sync-world: Backport for GrowthExperiments: set experiment config only in pilot wikis (T379681)
14:33 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox and not A:magru and A:dnsbox
14:30 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and A:magru and A:dnsbox
14:27 kartik@deploy2002: Finished scap sync-world: Backport for CX3 Build 0.2.0+20241114 (duration: 13m 23s)
14:25 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox and A:magru and A:dnsbox
14:22 kartik@deploy2002: kartik: Continuing with sync
14:18 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
14:17 kartik@deploy2002: kartik: Backport for CX3 Build 0.2.0+20241114 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:13 kartik@deploy2002: Started scap sync-world: Backport for CX3 Build 0.2.0+20241114
14:05 sukhe@cumin1002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
13:50 aqu@deploy2002: Finished deploy [airflow-dags/analytics@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] (duration: 01m 08s)
13:49 aqu@deploy2002: Started deploy [airflow-dags/analytics@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d]
13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7004.magru.wmnet
13:36 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d] (duration: 00m 15s)
13:36 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2220747]: Stage Refine parallelization improvment [airflow-dags@2220747d]
13:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7004.magru.wmnet
13:21 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@c5ab766]: T379546 (duration: 00m 54s)
13:21 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@c5ab766]: T379546
13:19 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix search button height - oblivian@cumin1002"
13:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix search button height - oblivian@cumin1002
13:18 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix search button height - oblivian@cumin1002
13:18 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix search button height - oblivian@cumin1002"
13:05 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-codfw: containerd migration
13:04 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet with OS bookworm
12:54 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
12:53 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7004.magru.wmnet
12:52 moritzm: installing apache2 security updates
12:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7004.magru.wmnet
12:51 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583) (duration: 09m 08s)
12:49 moritzm: failover ganeti master of magru02 to ganeti7002
12:46 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
12:45 dreamyjazz@deploy2002: dreamyjazz: Backport for Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7002.magru.wmnet
12:42 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
12:41 dreamyjazz@deploy2002: Started scap sync-world: Backport for Hide IP reveal tools on Special:AbuseLog and Special:GlobalBlockList (T379583)
12:38 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
12:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7002.magru.wmnet
12:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7002.magru.wmnet
12:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7002.magru.wmnet
12:22 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bookworm
12:19 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
12:18 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
12:17 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-codfw: containerd migration
12:10 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
12:00 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
11:57 moritzm: restarting postfix on inbound/outbound servers to pick up openssl updates
11:17 moritzm: installing openssl security updates
11:08 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-codfw: containerd migration
11:08 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2001.codfw.wmnet with OS bookworm
10:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
10:45 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
10:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
10:42 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
10:16 moritzm: remove ganeti2017 from active ganeti nodes T376594
10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
10:11 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bookworm
10:07 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) (duration: 00m 47s)
10:06 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-codfw: containerd migration
10:06 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided)
10:03 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided) (duration: 00m 21s)
10:03 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): (no justification provided)
09:43 kart_: Done: UTC morning backport window
09:37 kartik@deploy2002: Finished scap sync-world: Backport for Correction to virtual-globaljsonlinks mapping (T374746) (duration: 10m 03s)
09:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:32 kartik@deploy2002: bvibber, kartik: Continuing with sync
09:31 kartik@deploy2002: bvibber, kartik: Backport for Correction to virtual-globaljsonlinks mapping (T374746) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:27 kartik@deploy2002: Started scap sync-world: Backport for Correction to virtual-globaljsonlinks mapping (T374746)
09:25 kartik@deploy2002: Finished scap sync-world: Backport for CX3 Build 0.2.0+20241113 (T368718 T374567) (duration: 29m 40s)
09:21 kartik@deploy2002: kartik: Continuing with sync
09:17 volans: installed spicerack v8.16.0 on cumin2002
09:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4044.ulsfo.wmnet,cp4052.ulsfo.wmnet} and A:cp
09:04 vgutierrez@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4044.ulsfo.wmnet,cp4052.ulsfo.wmnet} and A:cp
09:00 kartik@deploy2002: kartik: Backport for CX3 Build 0.2.0+20241113 (T368718 T374567) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:56 kartik@deploy2002: Started scap sync-world: Backport for CX3 Build 0.2.0+20241113 (T368718 T374567)
08:55 vgutierrez: import haproxy 2.8.12 to thirtdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o) - T379891
08:54 kartik@deploy2002: Finished scap sync-world: Backport for Allow Wikidata bureaucrats to remove admin rights (T379635) (duration: 11m 49s)
08:49 kartik@deploy2002: dreamrimmer, kartik: Continuing with sync
08:47 kartik@deploy2002: dreamrimmer, kartik: Backport for Allow Wikidata bureaucrats to remove admin rights (T379635) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:42 kartik@deploy2002: Started scap sync-world: Backport for Allow Wikidata bureaucrats to remove admin rights (T379635)
08:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26744
08:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 26744
08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082
08:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 141082
08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9299
08:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 9299
08:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 140407
08:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 140407
08:28 kartik@deploy2002: Finished scap sync-world: Backport for Update stream registration and config for MinT for Readers (T378565) (duration: 24m 50s)
08:23 kartik@deploy2002: kcvelaga, kartik: Continuing with sync
08:08 kartik@deploy2002: kcvelaga, kartik: Backport for Update stream registration and config for MinT for Readers (T378565) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:03 kartik@deploy2002: Started scap sync-world: Backport for Update stream registration and config for MinT for Readers (T378565)
07:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet
07:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
07:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet
07:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove office link dns records - ayounsi@cumin1002"
07:34 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove office link dns records - ayounsi@cumin1002"
07:30 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
07:06 XioNoX: delete office interco IP/prefixes/vlan in ulsfo - T379778
04:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
04:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
04:09 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
03:56 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
02:32 eileen: config revision changed from 7af5769b to fbddc1f5
02:29 eileen: civicrm upgraded from 7b300007 to 2ab8334a
00:14 eileen: config revision changed from 2b08b881 to 7af5769b
00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
00:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
00:12 eileen: civicrm upgraded from 23e08fc2 to 7b300007
00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
00:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es1041.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED

2024-11-13

23:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:43 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host es1041.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:41 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for es104 - jclark@cumin1002"
23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for es104 - jclark@cumin1002"
23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:37 jclark@cumin1002: START - Cookbook sre.dns.netbox
23:20 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm
23:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
23:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
22:59 jclark@cumin1002: START - Cookbook sre.dns.netbox
22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wdqs1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:57 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
22:55 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
22:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
22:21 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
22:20 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
22:20 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
22:19 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
22:18 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
22:17 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
22:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
22:11 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
22:10 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
22:10 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
22:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
22:03 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
22:00 tchanders@deploy2002: Finished scap sync-world: Backport for Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503) (duration: 09m 03s)
21:55 tchanders@deploy2002: tchanders: Continuing with sync
21:55 tchanders@deploy2002: tchanders: Backport for Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:51 tchanders@deploy2002: Started scap sync-world: Backport for Revert "Disallow AbuseFilter protected variables use on non-temp-user wikis" (T379503)
21:48 cjming@deploy2002: Finished scap sync-world: Backport for Enable autocreateaccount on testcommonswiki (T378216) (duration: 12m 59s)
21:44 cjming@deploy2002: aude, cjming: Continuing with sync
21:40 cjming@deploy2002: aude, cjming: Backport for Enable autocreateaccount on testcommonswiki (T378216) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:36 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm
21:36 cjming@deploy2002: Started scap sync-world: Backport for Enable autocreateaccount on testcommonswiki (T378216)
21:34 cjming@deploy2002: Finished scap sync-world: Backport for GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746) (duration: 13m 27s)
21:27 cjming@deploy2002: cjming, bvibber: Continuing with sync
21:27 cjming@deploy2002: cjming, bvibber: Backport for GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:20 cjming@deploy2002: Started scap sync-world: Backport for GlobalJsonLinksCachePurgeJob to actually invalidate caches (T374746)
21:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
21:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:15 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
21:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2005
21:07 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2005
21:05 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:05 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:01 aqu@deploy2002: Finished deploy [airflow-dags/analytics@3487da3]: Stage Refine [airflow-dags@3487da3a] (duration: 01m 22s)
21:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@3487da3]: Stage Refine [airflow-dags@3487da3a]
20:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] (duration: 01m 14s)
20:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 aqu@deploy2002: Started deploy [airflow-dags/analytics@3fc12d6]: Stage Refine [airflow-dags@3fc12d60]
20:49 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
20:49 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
20:48 swfrench-wmf: deployed changeprop to clear no-op chart version diffs from CR 1089313
20:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
20:47 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
20:39 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm
20:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
20:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
20:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
20:36 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
20:35 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
20:34 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
20:34 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@3fc12d6]: Stage Refine [airflow-dags@3fc12d60] (duration: 00m 15s)
20:34 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@3fc12d6]: Stage Refine [airflow-dags@3fc12d60]
20:31 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
20:31 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
20:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:16 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
20:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
20:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:59 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-be2005
19:59 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-be2005
19:58 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
19:58 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
19:58 brennen@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662 (duration: 31m 07s)
19:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
19:55 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
19:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host thanos-be2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:51 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2005 to codfw - jhancock@cumin2002"
19:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding thanos-be2005 to codfw - jhancock@cumin2002"
19:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:47 cdanis@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
19:46 cdanis@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
19:44 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Update
19:37 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update
19:36 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
19:35 aokoth@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Update
19:27 brennen@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662
19:26 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.3 refs T375662
19:21 aokoth@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Update
19:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-be1005.eqiad.wmnet with OS bullseye
19:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:10 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:09 brennen: 1.44.0-wmf.3 train status (T375662): no current blockers, rolling to group1.
19:08 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/hdfs-synchronizer: apply
19:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:03 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:02 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:01 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host thanos-be1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:00 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:00 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1005 - jclark@cumin1002"
19:00 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for thanos-be1005 - jclark@cumin1002"
18:58 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/hdfs-synchronizer: apply
18:56 jclark@cumin1002: START - Cookbook sre.dns.netbox
18:50 swfrench@deploy2002: Finished scap sync-world: Deployment to switch mwdebug-next to publish-81 - T372604 (duration: 01m 53s)
18:48 swfrench@deploy2002: Started scap sync-world: Deployment to switch mwdebug-next to publish-81 - T372604
18:36 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
18:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
18:32 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
18:30 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@3499887]: I really hope this works this time (duration: 00m 34s)
18:29 cdanis@deploy2002: Started deploy [docker-pkg/deploy@3499887]: I really hope this works this time
18:29 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
18:26 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) (duration: 00m 18s)
18:26 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: (no justification provided)
18:22 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: (no justification provided) (duration: 00m 40s)
18:21 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: (no justification provided)
18:21 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@9d71ac3]: deploy 4.0.2 for realsies (duration: 02m 41s)
18:18 cdanis@deploy2002: Started deploy [docker-pkg/deploy@9d71ac3]: deploy 4.0.2 for realsies
18:13 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
18:13 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on ms-be2082.codfw.wmnet with reason: T371400
18:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
17:54 urbanecm: mwmaint2002: foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --search-index --verbose --random # T379057
17:49 cdanis@deploy2002: Finished deploy [docker-pkg/deploy@38eb04d]: ship upstream_version helper (duration: 00m 32s)
17:49 cdanis@deploy2002: Started deploy [docker-pkg/deploy@38eb04d]: ship upstream_version helper
17:49 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
17:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
17:46 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
17:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
17:40 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2002.codfw.wmnet
17:39 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl2002.codfw.wmnet
17:39 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl2002.codfw.wmnet
17:38 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet with OS bookworm
17:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
17:33 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
17:32 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
17:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker[2128-2135].codfw.wmnet
17:23 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker[2128-2135].codfw.wmnet
17:20 claime: homer 'lsw1-d2-codfw*' commit 'T377008'
17:18 claime: homer 'lsw1-c2-codfw*' commit 'T377008'
17:18 claime: homer 'lsw1-d4-codfw*' commit 'T377008'
17:17 claime: homer 'lsw1-c4-codfw*' commit 'T377008'
17:15 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
17:14 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
17:11 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
17:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye
17:02 claime: homer 'cr*codfw*' commit T377008
17:01 claime: homer 'lsw1-b4-codfw*' commit T377008
17:01 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
16:58 claime: homer 'lsw1-b2-codfw*' commit T377008
16:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
16:53 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wikikube-ctrl2002
16:53 jayme@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2002
16:53 jayme@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2002
16:53 jayme@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-ctrl2002.codfw.wmnet 76.32.192.10.in-addr.arpa 6.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
16:53 jayme@cumin2002: START - Cookbook sre.dns.wipe-cache wikikube-ctrl2002.codfw.wmnet 76.32.192.10.in-addr.arpa 6.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:53 jayme@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:53 jayme@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-ctrl2002 - jayme@cumin2002"
16:53 jayme@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wikikube-ctrl2002 - jayme@cumin2002"
16:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm
16:49 jayme@cumin2002: START - Cookbook sre.dns.netbox
16:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm
16:47 jayme@cumin2002: START - Cookbook sre.hosts.move-vlan for host wikikube-ctrl2002
16:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
16:47 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bookworm
16:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
16:41 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: reimage
16:40 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: reimage
16:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
16:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage
16:31 jayme@cumin2002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2002.codfw.wmnet
16:30 elukey: reload nginx on registry* to pick up logging changes (log of X-Client-IP from the CDN)
16:30 XioNoX: shutdown old office link interface - T379778
16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm
16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage
16:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
16:26 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage
16:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage
16:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm
16:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
16:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
16:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage
16:08 sukhe: running agent on A:ulsfo and A:lvs
16:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm
16:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm
16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage
16:04 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage
16:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage
15:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm
15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm
15:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
15:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
15:45 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/hdfs-synchronizer: apply
15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm
15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm
15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm
15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage
15:36 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:35 moritzm: failover ganeti master of magru01 to ganeti7001
15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage
15:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage
15:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:33 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:30 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:30 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:30 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
15:30 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
15:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage
15:26 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
15:18 moritzm: installing apache2 security updates
15:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage
15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm
15:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage
15:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm
15:12 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm
14:59 volans: uploaded spicerack_8.16.0 to apt.wikimedia.org bullseye-wikimedia
14:57 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm
14:56 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@2eb8320]: Stage Refine [airflow-dags@2eb8320d] (duration: 00m 14s)
14:55 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@2eb8320]: Stage Refine [airflow-dags@2eb8320d]
14:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage
14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage
14:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
14:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
14:37 moritzm: installing openssl security updates
14:36 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2131.codfw.wmnet with OS bookworm
14:36 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2130.codfw.wmnet with OS bookworm
14:35 Lucas_WMDE: UTC afternoon backport+config window done
14:33 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
14:32 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for TimedMediahandler: reenable shellbox-video for commons (T356241) (duration: 07m 28s)
14:30 btullis@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-jumbo-eqiad
14:27 lucaswerkmeister-wmde@deploy2002: hnowlan, lucaswerkmeister-wmde: Continuing with sync
14:27 lucaswerkmeister-wmde@deploy2002: hnowlan, lucaswerkmeister-wmde: Backport for TimedMediahandler: reenable shellbox-video for commons (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
14:24 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for TimedMediahandler: reenable shellbox-video for commons (T356241)
14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
14:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2128.codfw.wmnet with OS bookworm
14:14 tchanders@deploy2002: Finished scap sync-world: Backport for Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503) (duration: 11m 28s)
14:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
14:10 tchanders@deploy2002: tchanders: Continuing with sync
14:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
14:07 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
14:07 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1052.eqiad.wmnet to cluster eqiad and group D
14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
14:06 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1052.eqiad.wmnet to cluster eqiad and group D
14:06 tchanders@deploy2002: tchanders: Backport for Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:03 tchanders@deploy2002: Started scap sync-world: Backport for Disallow AbuseFilter protected variables use on non-temp-user wikis (T379503)
14:03 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
14:02 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
14:01 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
14:01 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
14:00 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
13:59 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
13:32 btullis@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-jumbo-eqiad
13:21 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
13:20 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
13:18 moritzm: installing python-cryptography security updates
13:18 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
13:18 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
13:17 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
13:14 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
13:13 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
13:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:07 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
13:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:03 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
12:59 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2129.codfw.wmnet with OS bookworm
12:56 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
12:56 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
12:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
12:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2128.codfw.wmnet with OS bookworm
12:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T376905)', diff saved to https://phabricator.wikimedia.org/P71030 and previous config saved to /var/cache/conftool/dbconfig/20241113-124504-ladsgroup.json
12:44 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2128.codfw.wmnet with OS bookworm
12:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1051.eqiad.wmnet to cluster eqiad and group D
12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm
12:32 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1051.eqiad.wmnet to cluster eqiad and group D
12:31 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
12:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm
12:30 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P71029 and previous config saved to /var/cache/conftool/dbconfig/20241113-122957-ladsgroup.json
12:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm
12:29 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
12:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
12:28 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
12:18 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
12:15 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
12:15 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P71028 and previous config saved to /var/cache/conftool/dbconfig/20241113-121450-ladsgroup.json
12:14 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
12:14 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
12:13 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
12:13 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
12:11 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
12:11 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
12:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1052.eqiad.wmnet
12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
12:01 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T376905)', diff saved to https://phabricator.wikimedia.org/P71027 and previous config saved to /var/cache/conftool/dbconfig/20241113-115943-ladsgroup.json
11:57 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
11:57 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/ipoid: apply
11:57 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1052.eqiad.wmnet
11:57 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1051.eqiad.wmnet
11:55 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1052
11:54 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1052
11:52 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
11:51 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
11:51 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:49 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1022 (T376905)', diff saved to https://phabricator.wikimedia.org/P71026 and previous config saved to /var/cache/conftool/dbconfig/20241113-114913-ladsgroup.json
11:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1051.eqiad.wmnet
11:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
11:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
11:48 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1051
11:46 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1051
11:45 stevemunene@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
11:41 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
11:41 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet with OS bookworm
11:34 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:34 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wikikube-worker1256.eqiad.wmnet with reason: Degraded RAID
11:26 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on wikikube-worker1256.eqiad.wmnet with reason: Degraded RAID
11:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1256.eqiad.wmnet
11:25 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1256.eqiad.wmnet
11:19 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
11:18 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
11:17 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
11:14 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
11:10 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
11:09 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
10:42 ladsgroup@deploy2002: Finished scap sync-world: Backport for Set the ratio of the new ParserCache keys to 100 for prod (T373037) (duration: 07m 32s)
10:37 ladsgroup@deploy2002: ladsgroup: Continuing with sync
10:36 ladsgroup@deploy2002: ladsgroup: Backport for Set the ratio of the new ParserCache keys to 100 for prod (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
10:34 ladsgroup@deploy2002: Started scap sync-world: Backport for Set the ratio of the new ParserCache keys to 100 for prod (T373037)
10:32 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
10:27 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bookworm
10:26 ladsgroup@deploy2002: ladsgroup: Continuing with sync
10:26 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
10:24 jayme@cumin2002: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
10:24 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet with OS bookworm
10:21 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
10:20 btullis@cumin1002: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
10:20 ladsgroup@deploy2002: ladsgroup: Backport for Set the ratio of the new ParserCache keys to 100 for prod (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:18 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
10:17 ladsgroup@deploy2002: Started scap sync-world: Backport for Set the ratio of the new ParserCache keys to 100 for prod (T373037)
10:09 elukey: disallow calls to /v2/_catalog from the outside internet on Docker Registry hosts - T378618
10:04 claime: Manual restart of dump_cloud_ip_ranges.service on 'A:puppetserver or A:puppetmaster'
10:01 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
10:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2088.codfw.wmnet with OS bullseye
10:00 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
10:00 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
09:55 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
09:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
09:38 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
09:25 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye
09:20 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bookworm
09:20 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
09:11 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye
09:01 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye
08:54 kart_: Updated recommedation-api to 2024-11-08-142328-production and fix wikidata host header (T379592)
08:49 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
08:49 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye
08:46 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
08:33 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
08:27 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
08:14 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye
08:13 ladsgroup@deploy2002: Finished scap sync-world: Backport for Revert "cswiki: Add celebration logo" (duration: 09m 18s)
08:08 ladsgroup@deploy2002: ladsgroup, hamishz: Continuing with sync
08:07 ladsgroup@deploy2002: ladsgroup, hamishz: Backport for Revert "cswiki: Add celebration logo" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:06 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
08:04 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "cswiki: Add celebration logo"
07:47 Amir1: running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis (T308084)
05:17 eileen: civicrm upgraded from ad008134 to 23e08fc2
02:56 tchin@deploy2002: Finished deploy [airflow-dags/analytics@58d7b82]: (no justification provided) (duration: 00m 10s)
02:56 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided)
02:55 tchin@deploy2002: deploy aborted: failedpythonlol (duration: 00m 05s)
02:55 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: failedpythonlol
00:54 tchin@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided)
00:35 ejegg: payments-wiki upgraded from 7d24a942 to 459f259b

2024-11-12

23:28 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
23:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
23:08 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
22:35 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
22:11 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
21:55 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
21:55 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
21:28 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@58d7b82]: (no justification provided) (duration: 03m 50s)
21:27 SandraEbele_: deploying airflow as part of weekly deployment train
21:27 urbanecm@deploy2002: Finished scap sync-world: Backport for Fix warning about missing central account for temp users (T378289), Check session provider when autocreating (T378289) (duration: 16m 11s)
21:25 ebysans@deploy2002: Started deploy [airflow-dags/analytics@58d7b82]: (no justification provided)
21:23 SandraEbele_: Deployed refinery using scap, then deployed onto hdfs
21:22 urbanecm@deploy2002: urbanecm, tgr: Continuing with sync
21:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
21:13 urbanecm@deploy2002: urbanecm, tgr: Backport for Fix warning about missing central account for temp users (T378289), Check session provider when autocreating (T378289) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:11 urbanecm@deploy2002: Started scap sync-world: Backport for Fix warning about missing central account for temp users (T378289), Check session provider when autocreating (T378289)
21:09 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert^2 "[CirrusSearch] testwiki: enable offloading weighted tags via EventBus" (T378983) (duration: 07m 18s)
21:04 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@113ea5ac] (duration: 04m 09s)
21:02 urbanecm@deploy2002: Started scap sync-world: Backport for Revert^2 "[CirrusSearch] testwiki: enable offloading weighted tags via EventBus" (T378983)
20:59 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@113ea5ac]
20:59 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a] (thin): Regular analytics weekly train THIN [analytics/refinery@113ea5ac] (duration: 04m 54s)
20:54 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a] (thin): Regular analytics weekly train THIN [analytics/refinery@113ea5ac]
20:53 ebysans@deploy2002: Finished deploy [analytics/refinery@113ea5a]: Regular analytics weekly train [analytics/refinery@113ea5ac] (duration: 07m 37s)
20:49 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
20:46 ebysans@deploy2002: Started deploy [analytics/refinery@113ea5a]: Regular analytics weekly train [analytics/refinery@113ea5ac]
19:42 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1001.eqiad.wmnet
19:42 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1001.eqiad.wmnet
19:42 jayme@cumin2002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1001.*
19:40 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
19:16 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
19:14 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.3 refs T375662
19:13 jayme@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
19:06 brennen: 1.44.0-wmf.3 train status (T375662): no current blockers, rolling to group0.
18:55 moritzm: installing libarchive security updates
18:55 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
18:31 swfrench@deploy2002: Finished scap sync-world: Backport for Add title-case mapping to support migration to PHP 8.1 (T372603) (duration: 18m 48s)
18:25 swfrench@deploy2002: swfrench: Continuing with sync
18:24 swfrench-wmf: verified consistent 7.4-like title-case behavior in 7.4- and 8.1-based images, verified expected treatment of eszett in mwdebug - T372603
18:19 swfrench@deploy2002: swfrench: Backport for Add title-case mapping to support migration to PHP 8.1 (T372603) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:12 swfrench@deploy2002: Started scap sync-world: Backport for Add title-case mapping to support migration to PHP 8.1 (T372603)
18:08 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
18:01 moritzm: remove ganeti1012 from active ganeti nodes T378921
17:59 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
17:57 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
17:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
17:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
17:35 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
17:34 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
17:26 brennen@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662 (duration: 45m 29s)
16:55 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
16:54 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
16:54 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
16:53 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
16:48 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
16:47 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
16:40 brennen@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662
16:39 jayme@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
16:37 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
16:34 dancy@deploy2002: Installation of scap version "4.123.0" completed for 209 hosts
16:30 dancy@deploy2002: Installing scap version "4.123.0" for 209 hosts
16:18 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
16:18 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
16:17 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
16:17 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
16:16 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
16:15 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
16:13 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr[1-2]-eqiad
16:13 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for cr[1-2]-eqiad
16:08 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
16:07 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:57 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
15:56 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:55 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:52 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
15:52 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
15:47 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
15:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
15:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
15:27 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:19 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
15:16 jayme@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1002.eqiad.wmnet
15:16 jayme@cumin2002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1002.eqiad.wmnet
15:16 topranks: moving fundraising links in eqiad from old to new firewall cluster and switches (T377381)
15:14 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
15:13 jayme@cumin2002: END (FAIL) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=99) Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
15:10 jayme@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cr[1-2]-eqiad,pfw3-eqiad with reason: fundraising tech migration to new equipment
15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cr[1-2]-eqiad,pfw3-eqiad with reason: fundraising tech migration to new equipment
15:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
14:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on fasw-c-eqiad with reason: fundraising tech migration to new equipment
14:30 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on fasw-c-eqiad with reason: fundraising tech migration to new equipment
14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
14:28 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns records for IPs moving from old to new fundraising firewalls - cmooney@cumin1002"
14:26 moritzm: installing apache2 security updates
14:23 cmooney@cumin1002: START - Cookbook sre.dns.netbox
14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:03 urbanecm@deploy2002: Started scap sync-world: Backport for [CirrusSearch] testwiki: enable offloading weighted tags via EventBus (T378983)
13:58 urbanecm@deploy2002: Started scap sync-world: Backport for [CirrusSearch] testwiki: enable offloading weighted tags via EventBus (T378983)
13:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
13:43 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662
13:37 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.3 refs T375662
13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
13:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain
13:14 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain
13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
13:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:10 jayme@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bookworm
13:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd
13:09 jayme@cumin2002: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster wikikube-eqiad: containerd migration
13:09 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd
12:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1002.eqiad.wmnet to plain
12:53 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1002.eqiad.wmnet to plain
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
12:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1002.eqiad.wmnet to drbd
12:35 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1002.eqiad.wmnet to drbd
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
12:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2236 slowly with 10 steps - slow repool T373579
12:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
12:09 moritzm: remove ganeti1015 from active ganeti nodes T378921
12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1010.eqiad.wmnet
12:08 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:08 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
12:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
11:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:52 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:48 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
11:47 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1010.eqiad.wmnet
11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1013.eqiad.wmnet
11:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:42 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:40 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1013.eqiad.wmnet
11:23 btullis@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2217 gradually with 4 steps - T379491
10:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:37 btullis@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:12 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2236 slowly with 10 steps - slow repool T373579
09:59 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2217 gradually with 4 steps - T379491
09:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71006 and previous config saved to /var/cache/conftool/dbconfig/20241112-094851-arnaudb.json
09:41 moritzm: update d-i netboot image for 12.8 point release T379600
09:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71005 and previous config saved to /var/cache/conftool/dbconfig/20241112-093343-arnaudb.json
09:18 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "CirrusSearch: re-enable offloading weighted tags via EventBus" (duration: 06m 46s)
09:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P71004 and previous config saved to /var/cache/conftool/dbconfig/20241112-091836-arnaudb.json
09:17 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:14 urbanecm@deploy2002: trainbranchbot, urbanecm: Continuing with sync
09:14 urbanecm@deploy2002: trainbranchbot, urbanecm: Backport for Revert "CirrusSearch: re-enable offloading weighted tags via EventBus" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:11 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "CirrusSearch: re-enable offloading weighted tags via EventBus"
09:10 urbanecm@deploy2002: Sync cancelled.
09:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71002 and previous config saved to /var/cache/conftool/dbconfig/20241112-090329-arnaudb.json
08:38 urbanecm@deploy2002: pfischer, urbanecm: Backport for CirrusSearch: re-enable offloading weighted tags via EventBus (T378983) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:36 urbanecm@deploy2002: Started scap sync-world: Backport for CirrusSearch: re-enable offloading weighted tags via EventBus (T378983)
08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
08:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
08:28 urbanecm@deploy2002: Finished scap sync-world: Backport for Fix WeightedTagsUpdater (T378664 T378983) (duration: 06m 59s)
08:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
08:21 urbanecm@deploy2002: Started scap sync-world: Backport for Fix WeightedTagsUpdater (T378664 T378983)
08:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
08:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
08:04 moritzm: installing apache security updates
08:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P71001 and previous config saved to /var/cache/conftool/dbconfig/20241112-080303-arnaudb.json
08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
08:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
08:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
08:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
07:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti-test2003
07:53 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti-test2003
07:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
07:52 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.28 (duration: 01m 52s)

2024-11-11

away: UTC late deploys done
23:08 tgr@deploy2002: scap failed: <CalledProcessError> Command '['sudo', '-u', 'mwbuilder', '-n', '--', '/usr/bin/scap', 'mwscript', '--no-local-config', '--directory', '/srv/mediawiki-staging', '--user', 'www-data', '--network', '--', 'purgeMessageBlobStore.php']' returned non-zero exit status 1. (scap version: 4.122.0) (duration: 11m 44s)
23:02 tgr@deploy2002: d3r1ck01, tgr: Continuing with sync
22:59 tgr@deploy2002: d3r1ck01, tgr: Backport for PageUpdater: restore call to RevisionFromEditComplete (T379152) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:56 tgr@deploy2002: Started scap sync-world: Backport for PageUpdater: restore call to RevisionFromEditComplete (T379152)
22:30 tgr@deploy2002: Finished scap sync-world: Backport for contactpage: Update AffCom contact form messages (Resubmit) (T375392) (duration: 25m 48s)
22:21 tgr@deploy2002: tgr: Continuing with sync
22:19 tgr@deploy2002: tgr: Backport for contactpage: Update AffCom contact form messages (Resubmit) (T375392) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:13 eileen: civicrm upgraded from 4330588d to bcd072a1
22:05 tgr@deploy2002: Started scap sync-world: Backport for contactpage: Update AffCom contact form messages (Resubmit) (T375392)
21:38 tgr@deploy2002: Finished scap sync-world: Backport for contactpages: Update Affcom UserGroup application form (T375392) (duration: 28m 07s)
21:33 tgr@deploy2002: ammarpad, tgr: Continuing with sync
21:12 tgr@deploy2002: ammarpad, tgr: Backport for contactpages: Update Affcom UserGroup application form (T375392) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:10 tgr@deploy2002: Started scap sync-world: Backport for contactpages: Update Affcom UserGroup application form (T375392)
20:21 eileen: civicrm upgraded from 65a8de90 to 4330588d
17:55 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add superset links - oblivian@cumin1002 - T379567"
17:55 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add superset links - oblivian@cumin1002 - T379567
17:54 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add superset links - oblivian@cumin1002 - T379567
17:54 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add superset links - oblivian@cumin1002 - T379567"
16:19 elukey: restart pybal on lvs2013 (primary) to pick up new kartotherian-k8s-ssl service
16:17 elukey: restart pybal on lvs2014 (secondary) to pick up new kartotherian-k8s-ssl service
16:10 elukey: restart pybal on lvs1019 (primary) to pick up new kartotherian-k8s-ssl service
16:09 elukey: restart pybal on lvs1020 (secondary) to pick up new kartotherian-k8s-ssl service
16:09 moritzm: installing libarchive security updates
15:55 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
15:55 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
15:54 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: cluster=codfw,service=kartotherian-k8s-ssl
15:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1311.eqiad.wmnet with OS bookworm
15:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1309.eqiad.wmnet with OS bookworm
15:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:00 Lucas_WMDE: UTC afternoon backport+config window done
15:00 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for wikipedias: clear link-recommendations on page save (T379522) (duration: 10m 59s)
14:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:56 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Continuing with sync
14:51 lucaswerkmeister-wmde@deploy2002: migr, lucaswerkmeister-wmde: Backport for wikipedias: clear link-recommendations on page save (T379522) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:49 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for wikipedias: clear link-recommendations on page save (T379522)
14:44 btullis@cumin1002: END (FAIL) - Cookbook sre.presto.roll-restart-workers (exit_code=99) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
14:37 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1310.eqiad.wmnet with OS bookworm
14:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:35 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2088.codfw.wmnet with OS bullseye
14:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1312.eqiad.wmnet with OS bookworm
14:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1306.eqiad.wmnet with OS bookworm
14:32 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1308.eqiad.wmnet with OS bookworm
14:28 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:28 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:27 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye
14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
14:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1307.eqiad.wmnet with OS bookworm
14:26 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:25 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1305.eqiad.wmnet with OS bookworm
14:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:20 zabe@deploy2002: Finished scap sync-world: Backport for zhwiki: Allow event-organizer self remove usergroup (T376061) (duration: 10m 40s)
14:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
14:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
14:15 zabe@deploy2002: zabe, zhaofjx: Continuing with sync
14:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
14:12 zabe@deploy2002: zabe, zhaofjx: Backport for zhwiki: Allow event-organizer self remove usergroup (T376061) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
14:09 zabe@deploy2002: Started scap sync-world: Backport for zhwiki: Allow event-organizer self remove usergroup (T376061)
14:07 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2088.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
14:06 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
14:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc2002.wikimedia.org
14:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
14:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1312.eqiad.wmnet with reason: host reimage
14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1308.eqiad.wmnet with reason: host reimage
14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1309.eqiad.wmnet with reason: host reimage
14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1311.eqiad.wmnet with reason: host reimage
14:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
14:04 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1310.eqiad.wmnet with reason: host reimage
14:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1307.eqiad.wmnet with reason: host reimage
14:03 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1306.eqiad.wmnet with reason: host reimage
14:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1305.eqiad.wmnet with reason: host reimage
13:55 moritzm: powercycled ganeti2031
13:44 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:39 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc2002.wikimedia.org
13:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc1002.wikimedia.org
13:38 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1312.eqiad.wmnet with OS bookworm
13:34 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1311.eqiad.wmnet with OS bookworm
13:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
13:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1311.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1312.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1310.eqiad.wmnet with OS bookworm
13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1309.eqiad.wmnet with OS bookworm
13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1308.eqiad.wmnet with OS bookworm
13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1307.eqiad.wmnet with OS bookworm
13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1306.eqiad.wmnet with OS bookworm
13:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1305.eqiad.wmnet with OS bookworm
13:30 jmm@cumin2002: START - Cookbook sre.dns.netbox
13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1307.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1309.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1310.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1308.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1305.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc1002.wikimedia.org
13:22 jynus: reverting deleted rows on db1176 (mailman3) T379519
13:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1312.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1311.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:12 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1050.eqiad.wmnet to cluster eqiad and group D
13:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1050.eqiad.wmnet to cluster eqiad and group D
13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1310.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1309.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1308.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1307.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1306.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1305.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
13:10 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Exclude temp account viewer autopromotions from RC (T377829) (duration: 07m 07s)
13:08 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
13:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for wikikube-worker - jclark@cumin1002"
13:05 dreamyjazz@deploy2002: mszabo, dreamyjazz: Continuing with sync
13:05 dreamyjazz@deploy2002: mszabo, dreamyjazz: Backport for Exclude temp account viewer autopromotions from RC (T377829) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix bug in requestctl commit - oblivian@cumin1002"
13:05 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bug in requestctl commit - oblivian@cumin1002
13:04 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bug in requestctl commit - oblivian@cumin1002
13:04 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix bug in requestctl commit - oblivian@cumin1002"
13:04 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:03 dreamyjazz@deploy2002: Started scap sync-world: Backport for Exclude temp account viewer autopromotions from RC (T377829)
13:00 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
12:54 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
12:48 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
12:42 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
12:41 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1049.eqiad.wmnet to cluster eqiad and group D
12:40 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1049.eqiad.wmnet to cluster eqiad and group D
12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1050.eqiad.wmnet
12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1050.eqiad.wmnet
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1049.eqiad.wmnet
12:23 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1049.eqiad.wmnet
12:18 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1050
12:16 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1050
12:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1049
12:15 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1049
12:13 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
12:06 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
12:01 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
11:56 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
11:56 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host an-redacteddb1001.eqiad.wmnet
11:54 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
11:46 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
11:44 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
11:43 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet
11:43 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
11:43 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
11:30 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:06 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:04 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
10:57 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
10:55 elukey@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002"
10:01 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002
10:00 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002
10:00 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002"
09:10 moritzm: remove ganeti1011 from active ganeti nodes T378921
09:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet
08:40 urbanecm@deploy2002: Finished scap sync-world: Backport for Update Wikimedia Foundation primary address. (T379417), Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026) (duration: 07m 15s)
08:35 urbanecm@deploy2002: urbanecm, varnent: Continuing with sync
08:35 urbanecm@deploy2002: urbanecm, varnent: Backport for Update Wikimedia Foundation primary address. (T379417), Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:32 urbanecm@deploy2002: Started scap sync-world: Backport for Update Wikimedia Foundation primary address. (T379417), Update Office Wiki favicon to use wmf.ico and also delete now unused office.ico file. (T378026)
08:32 urbanecm@deploy2002: Finished scap sync-world: Backport for Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500) (duration: 20m 59s)
08:24 urbanecm@deploy2002: urbanecm, hamishz: Continuing with sync
08:22 urbanecm@deploy2002: urbanecm, hamishz: Backport for Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002"
08:18 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002
08:17 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Update to latest - oblivian@cumin1002
08:17 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Update to latest - oblivian@cumin1002"
08:11 urbanecm@deploy2002: Started scap sync-world: Backport for Allow wgGroupsRemoveFromSelf for templateeditor, confirmed, and abusefilter-helper in zhwiki (T379500)
07:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
07:49 _joe_: installing conftool 4.1.0 on puppetservers
07:15 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .

2024-11-10

23:43 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
23:17 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
23:14 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
22:51 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
22:29 jhathaway: re-imaging ms-be2082 to test efi boot order
12:32 elukey: optimize table `archive` on db2217 - frwiki db - corrupt index error (host already depooled)
12:26 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index
12:26 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Corrupt Index
12:25 slyngshede@cumin1002: dbctl commit (dc=all): 'Depool db2217', diff saved to https://phabricator.wikimedia.org/P70997 and previous config saved to /var/cache/conftool/dbconfig/20241110-122532-slyngshede.json

2024-11-09

14:49 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
14:49 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
14:48 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
14:48 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
14:48 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
14:48 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply

2024-11-08

23:35 zabe: attach Sotiale's local accounts on newly created wikis
23:16 Reedy: ran `delete from oathauth_devices where oad_id=4506;` on centralauth for T379398 because oad_user=0
23:07 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
22:54 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
22:54 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
22:54 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
22:54 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
22:54 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
22:54 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
22:52 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
22:51 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
22:51 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
22:51 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
22:51 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
22:51 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
22:44 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
22:41 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
22:39 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
22:39 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
22:39 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
22:38 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
22:38 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
22:38 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
22:29 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
22:28 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
21:18 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
21:18 denisse: disabling Puppet on grafana2001 - T379043
21:17 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
21:12 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
21:08 mutante: cumint2002 [cumin2002:~] $ sudo systemctl reset-failed
21:05 mutante: cumin2002 - sudo systemctl status httpbb_kubernetes_mw-api-int_hourly
20:28 aude@deploy2002: Finished scap sync-world: Backport for Reviving "Update interwiki map" (duration: 10m 19s)
20:24 aude@deploy2002: seddon, aude: Continuing with sync
20:21 aude@deploy2002: seddon, aude: Backport for Reviving "Update interwiki map" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
20:18 aude@deploy2002: Started scap sync-world: Backport for Reviving "Update interwiki map"
20:15 aude@deploy2002: Finished scap sync-world: Backport for Enable Tabular data for test commons (T378127) (duration: 10m 55s)
20:10 aude@deploy2002: aude: Continuing with sync
20:06 aude@deploy2002: aude: Backport for Enable Tabular data for test commons (T378127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:04 aude@deploy2002: Started scap sync-world: Backport for Enable Tabular data for test commons (T378127)
20:02 aude@deploy2002: Finished scap sync-world: Backport for Reopen testcommonswiki for testing Chart extension (duration: 14m 33s)
19:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be2082.codfw.wmnet with reason: T371400
19:59 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on ms-be2082.codfw.wmnet with reason: T371400
19:57 aude@deploy2002: aude: Continuing with sync
19:50 aude@deploy2002: aude: Backport for Reopen testcommonswiki for testing Chart extension synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:47 aude@deploy2002: Started scap sync-world: Backport for Reopen testcommonswiki for testing Chart extension
18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2168.codfw.wmnet with OS bookworm
18:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm
18:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2167.codfw.wmnet with OS bookworm
18:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2170.codfw.wmnet with OS bookworm
18:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2169.codfw.wmnet with OS bookworm
18:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2166.codfw.wmnet with OS bookworm
18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2165.codfw.wmnet with OS bookworm
18:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
18:21 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2164.codfw.wmnet with OS bookworm
18:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage
18:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox
18:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage
18:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage
18:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage
18:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2170.codfw.wmnet with reason: host reimage
18:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage
18:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2169.codfw.wmnet with reason: host reimage
18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage
18:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2168.codfw.wmnet with reason: host reimage
18:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2167.codfw.wmnet with reason: host reimage
18:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage
17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2145.codfw.wmnet with OS bookworm
17:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2166.codfw.wmnet with reason: host reimage
17:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2165.codfw.wmnet with reason: host reimage
17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:57 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
17:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Create new snippets for frack IPs - cmooney@cumin1002"
17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2144.codfw.wmnet with OS bookworm
17:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
17:56 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
17:56 herron@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-worker1005.eqiad.wmnet
17:56 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm
17:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2164.codfw.wmnet with reason: host reimage
17:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:52 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2163.codfw.wmnet with OS bookworm
17:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2170.codfw.wmnet with OS bookworm
17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2157.codfw.wmnet with OS bookworm
17:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:49 cmooney@cumin1002: START - Cookbook sre.dns.netbox
17:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2169.codfw.wmnet with OS bookworm
17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2160.codfw.wmnet with OS bookworm
17:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2168.codfw.wmnet with OS bookworm
17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2158.codfw.wmnet with OS bookworm
17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2167.codfw.wmnet with OS bookworm
17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2162.codfw.wmnet with OS bookworm
17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2166.codfw.wmnet with OS bookworm
17:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage
17:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2156.codfw.wmnet with OS bookworm
17:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2165.codfw.wmnet with OS bookworm
17:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2161.codfw.wmnet with OS bookworm
17:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage
17:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2164.codfw.wmnet with OS bookworm
17:37 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage
17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2159.codfw.wmnet with OS bookworm
17:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:34 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
17:32 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1005.eqiad.wmnet with reason: host reimage
17:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage
17:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:29 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
17:27 jynus: rebuild frwiki.geo_tags @ an-redacteddb1001
17:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage
17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage
17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage
17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage
17:17 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
17:17 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bullseye
17:15 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1005.eqiad.wmnet with OS bookworm
17:14 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
17:14 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage
17:14 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1005.eqiad.wmnet on all recursors
17:13 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1005.eqiad.wmnet on all recursors
17:13 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:13 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
17:13 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1005.eqiad.wmnet - herron@cumin1002"
17:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage
17:10 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
17:09 herron@cumin1002: START - Cookbook sre.dns.netbox
17:09 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1005.eqiad.wmnet
17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2158.codfw.wmnet with reason: host reimage
17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2144.codfw.wmnet with reason: host reimage
17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2145.codfw.wmnet with reason: host reimage
17:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2157.codfw.wmnet with reason: host reimage
17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2161.codfw.wmnet with reason: host reimage
17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2160.codfw.wmnet with reason: host reimage
17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2162.codfw.wmnet with reason: host reimage
17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2156.codfw.wmnet with reason: host reimage
17:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2159.codfw.wmnet with reason: host reimage
17:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2163.codfw.wmnet with OS bookworm
17:05 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bookworm
17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2136.codfw.wmnet with OS bookworm
17:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:55 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2162.codfw.wmnet with OS bookworm
16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2161.codfw.wmnet with OS bookworm
16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2160.codfw.wmnet with OS bookworm
16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2159.codfw.wmnet with OS bookworm
16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2158.codfw.wmnet with OS bookworm
16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2157.codfw.wmnet with OS bookworm
16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2156.codfw.wmnet with OS bookworm
16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2145.codfw.wmnet with OS bookworm
16:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2144.codfw.wmnet with OS bookworm
16:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
16:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
16:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
16:25 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
16:22 herron@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
16:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm
16:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:05 herron@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage
16:02 herron@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1004.eqiad.wmnet with reason: host reimage
16:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2139.codfw.wmnet with OS bookworm
15:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
15:55 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
15:48 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2142.codfw.wmnet with OS bookworm
15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2143.codfw.wmnet with OS bookworm
15:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2141.codfw.wmnet with OS bookworm
15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2129.codfw.wmnet with OS bookworm
15:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2140.codfw.wmnet with OS bookworm
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2138.codfw.wmnet with OS bookworm
15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2137.codfw.wmnet with OS bookworm
15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage
15:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2136.codfw.wmnet with OS bookworm
15:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage
15:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2128.codfw.wmnet with OS bookworm
15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage
15:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2001.codfw.wmnet with OS bookworm
15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2087.codfw.wmnet with OS bullseye
15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
15:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
15:15 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage
15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage
15:08 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
15:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage
15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage
15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2142.codfw.wmnet with reason: host reimage
15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2143.codfw.wmnet with reason: host reimage
15:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2141.codfw.wmnet with reason: host reimage
15:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2140.codfw.wmnet with reason: host reimage
15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage
14:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2138.codfw.wmnet with reason: host reimage
14:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2136.codfw.wmnet with reason: host reimage
14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2137.codfw.wmnet with reason: host reimage
14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2129.codfw.wmnet with reason: host reimage
14:56 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2128.codfw.wmnet with reason: host reimage
14:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage
14:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:52 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2087.codfw.wmnet with reason: host reimage
14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2143.codfw.wmnet with OS bookworm
14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2142.codfw.wmnet with OS bookworm
14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2141.codfw.wmnet with OS bookworm
14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2140.codfw.wmnet with OS bookworm
14:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2139.codfw.wmnet with OS bookworm
14:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye
14:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2138.codfw.wmnet with OS bookworm
14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2137.codfw.wmnet with OS bookworm
14:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2136.codfw.wmnet with OS bookworm
14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2129.codfw.wmnet with OS bookworm
14:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2128.codfw.wmnet with OS bookworm
14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128']
14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128']
14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2158']
14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2158']
14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2157']
14:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2157']
14:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2156']
14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156']
14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2156']
14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2156']
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2145']
14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2145']
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2144']
14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144']
14:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-worker2144']
14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2144']
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2143']
14:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2143']
14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2142']
14:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2142']
14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2141']
14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2141']
14:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2140']
14:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2140']
14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2139']
14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2139']
14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2138']
14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2138']
14:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2137']
14:29 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2137']
14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2136']
14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2136']
14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2129']
14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2129']
14:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2128']
14:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2128']
14:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2086.codfw.wmnet with OS bullseye
14:18 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
13:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
13:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
12:32 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
12:30 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
12:30 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
12:30 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
12:29 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
12:28 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
12:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
12:04 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2087.codfw.wmnet with OS bullseye
11:59 apergos: testing of account creation backfill script on mwmaint2001 complete for the moment
11:53 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2087.codfw.wmnet with OS bullseye
11:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage
11:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2086.codfw.wmnet with reason: host reimage
11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:37 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
11:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2087.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2016.codfw.wmnet
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2016.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:13 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
11:13 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
11:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
11:04 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:00 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
10:58 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
10:56 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2016.codfw.wmnet
10:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2015.codfw.wmnet
10:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2015.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:51 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2015.codfw.wmnet
10:45 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
10:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:34 elukey@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
10:29 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet
10:18 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2086.codfw.wmnet with OS bullseye
10:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2086.codfw.wmnet with OS bullseye
10:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
10:02 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
10:01 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
09:57 apergos: testing account creation backfill script on mwmaint2001 in screen session as ariel
09:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2085.codfw.wmnet with OS bullseye
09:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
09:39 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
09:38 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2086.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
09:29 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure
09:29 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on an-presto1018.eqiad.wmnet with reason: Downtimed for further troubleshooting possible Hardware failure
09:24 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
09:20 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2085.codfw.wmnet with reason: host reimage
09:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye
09:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2085.codfw.wmnet with OS bullseye
09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a8-codfw
09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a8-codfw
09:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-a1-codfw
09:03 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-a1-codfw
09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b8-codfw
09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b8-codfw
09:01 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b7-codfw
09:01 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b7-codfw
08:56 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2085.codfw.wmnet with OS bullseye
08:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b6-codfw
08:54 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b6-codfw
08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b5-codfw
08:53 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b5-codfw
08:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b4-codfw
08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b4-codfw
08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b3-codfw
08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b3-codfw
08:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-b2-codfw
08:52 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-b2-codfw
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a8-codfw
08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a8-codfw
08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a7-codfw
08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a7-codfw
08:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C
08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a6-codfw
08:43 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a6-codfw
08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a5-codfw
08:42 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a5-codfw
08:42 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1048.eqiad.wmnet to cluster eqiad and group C
08:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a4-codfw
08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a4-codfw
08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a3-codfw
08:41 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a3-codfw
08:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-a2-codfw
08:40 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device lsw1-a2-codfw
08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-f1-eqiad
08:39 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-f1-eqiad
08:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-e1-eqiad
08:35 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device ssw1-e1-eqiad
08:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad
08:34 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:34 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad
08:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
08:31 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2085.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqsin
08:30 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cr2-eqsin
08:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
08:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
08:26 moritzm: upgraded ircstream on irc.wikimedia.org to 1.0.1
08:08 XioNoX: update gnmic to 0.39 on all netflow hosts
08:05 XioNoX: add gnmic 0.39 from official git repo to bookworm reprepro - T347461
07:48 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
07:48 XioNoX: manually install/test gnmic 0.39 on netflow6001
07:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
07:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1048.eqiad.wmnet
07:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1048.eqiad.wmnet
07:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1047.eqiad.wmnet
07:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1047.eqiad.wmnet
07:33 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C
07:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1047.eqiad.wmnet to cluster eqiad and group C

2024-11-07

23:00 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bookworm
22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2170.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2169.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2168.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2167.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2166.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
22:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2165.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2164.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2163.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2162.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2161.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2160.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2159.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2158.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2157.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2156.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2145.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2144.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2143.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2142.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:22 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
22:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2141.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2140.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:19 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
22:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2139.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2138.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2137.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2136.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2129.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2128.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2026.codfw.wmnet with OS bullseye
22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:08 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2027.codfw.wmnet with OS bullseye
22:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002"
21:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2170 to codfw - jhancock@cumin2002"
21:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage
21:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002"
21:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2166 to codfw - jhancock@cumin2002"
21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage
21:47 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2026.codfw.wmnet with reason: host reimage
21:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2027.codfw.wmnet with reason: host reimage
21:41 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002"
21:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2158 to codfw - jhancock@cumin2002"
21:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:27 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002"
21:26 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2143 to codfw - jhancock@cumin2002"
21:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:21 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2082.codfw.wmnet with OS bookworm
21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2027.codfw.wmnet with OS bullseye
21:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2026.codfw.wmnet with OS bullseye
21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2027']
21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wdqs2026']
21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2027']
21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2026']
21:11 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
21:11 jsn@deploy2002: Finished scap sync-world: Backport for Enable AutoModerator on viwiki (T378343) (duration: 08m 28s)
21:09 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
21:06 jsn@deploy2002: suecarmol, jsn: Continuing with sync
21:06 jsn@deploy2002: suecarmol, jsn: Backport for Enable AutoModerator on viwiki (T378343) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002"
21:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2128 to codfw - jhancock@cumin2002"
21:03 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
21:02 jsn@deploy2002: Started scap sync-world: Backport for Enable AutoModerator on viwiki (T378343)
21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:59 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2027.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wdqs2026.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002"
20:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wdqs2026 to codfw - jhancock@cumin2002"
20:46 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bookworm
20:43 jhancock@cumin2002: START - Cookbook sre.dns.netbox
20:35 cdanis@deploy2002: Finished scap sync-world: Backport for Enable Chart extension on testwiki and testcommonswiki (T378127) (duration: 13m 02s)
20:30 cdanis@deploy2002: cdanis, aude: Continuing with sync
20:25 cdanis@deploy2002: cdanis, aude: Backport for Enable Chart extension on testwiki and testcommonswiki (T378127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:22 cdanis@deploy2002: Started scap sync-world: Backport for Enable Chart extension on testwiki and testcommonswiki (T378127)
20:21 cdanis@deploy2002: Finished scap sync-world: Backport for DB config for testcommonswiki deployment for Charts (T379199) (duration: 10m 45s)
20:15 cdanis@deploy2002: cdanis, bvibber: Continuing with sync
20:13 cdanis@deploy2002: cdanis, bvibber: Backport for DB config for testcommonswiki deployment for Charts (T379199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:10 cdanis@deploy2002: Started scap sync-world: Backport for DB config for testcommonswiki deployment for Charts (T379199)
20:02 dduvall@deploy2002: Installing scap version "4.122.0" for 209 hosts
19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002"
19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dummy record for pfw1-eqiad.wikimedia.org - cmooney@cumin1002"
19:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox
19:33 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
19:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox
19:23 cdanis: T379199 💙cdanis@mwmaint2002.codfw.wmnet ~ 🕝☕ mwscript sql.php --wiki=testcommonswiki /srv/mediawiki/php-1.44.0-wmf.2/extensions/JsonConfig/sql/mysql/tables-generated.sql
19:19 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
19:19 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
19:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host vrts1003.eqiad.wmnet
19:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
19:11 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts1003.eqiad.wmnet with reason: nftables
19:10 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables
19:10 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on vrts2002.codfw.wmnet with reason: nftables
19:08 mutante: VRTS - switching firewall provider from iptables to nftables
19:06 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
19:03 herron@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host aux-k8s-worker1004.eqiad.wmnet
19:03 herron@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
19:00 herron@cumin1002: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1004.eqiad.wmnet with OS bookworm
18:59 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
18:59 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1004.eqiad.wmnet on all recursors
18:59 herron@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1004.eqiad.wmnet on all recursors
18:59 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:58 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
18:58 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM aux-k8s-worker1004.eqiad.wmnet - herron@cumin1002"
18:50 herron@cumin1002: START - Cookbook sre.dns.netbox
18:50 herron@cumin1002: START - Cookbook sre.ganeti.makevm for new host aux-k8s-worker1004.eqiad.wmnet
18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002"
18:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2138 to codfw - jhancock@cumin2002"
18:14 swfrench-wmf: updated changeprop-jobqueue to 2024-11-05-170900-production - T356241
18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
18:11 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
18:01 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
17:59 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
17:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
17:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
17:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cloudvirt1063.eqiad.wmnet
17:55 fnegri@cumin1002: START - Cookbook sre.hosts.remove-downtime for cloudvirt1063.eqiad.wmnet
17:48 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
17:48 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
17:42 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
17:41 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
17:29 fnegri@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002"
17:27 fnegri@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fnegri@cumin1002"
17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c1a-eqiad
17:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c1a-eqiad
17:12 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-global # T375508
17:09 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
17:08 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
17:06 rzl: manually run mediawiki_job_wikimediaevents-UpdatePeriodicMetrics-per-wiki # T375508
17:03 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
17:02 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
17:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
16:57 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
16:57 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
16:57 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2084.codfw.wmnet with OS bullseye
16:57 arlolra@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
16:56 arlolra@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
16:56 arlolra@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
16:56 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1063.eqiad.wmnet with reason: host reimage
16:54 arlolra@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
16:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
16:48 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:46 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2084.codfw.wmnet with OS bullseye
16:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:41 fnegri@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
16:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2084.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
16:28 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
16:28 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
16:24 arlolra@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
16:23 arlolra@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
16:15 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
16:07 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
16:04 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
15:57 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad
15:54 moritzm: remove ganeti1010 from active ganeti nodes T378921
15:53 joelyrookewmde: Finished populateSitesTable for tcywiktionary (T378466) and tcywikisource (T378474)
15:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
15:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
15:39 jgiannelos@deploy2002: Finished deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase (duration: 21m 33s)
15:33 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad
15:31 taavi: taavi@deploy2002 ~ $ mwscript-k8s migrateUserGroup.php -- --wiki=labswiki contentadmin sysop # T375950
15:31 joelyrookewmde: joelyrookewmde@mwmaint2002:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https
15:29 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw
15:18 jgiannelos@deploy2002: Started deploy [restbase/deploy@6d0b97e]: Add new wikis to RESTBase
15:16 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2082.codfw.wmnet with OS bullseye
15:15 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 01m 13s)
15:14 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided)
15:11 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided) (duration: 00m 52s)
15:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@abc27c0] (releasing): (no justification provided)
15:07 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw
14:55 hashar: Restarted CI Jenkins for plugins update
14:41 moritzm: installing python-git security updates
14:29 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381) (duration: 09m 37s)
14:20 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Continuing with sync
14:18 lucaswerkmeister-wmde@deploy2002: esanders, lucaswerkmeister-wmde: Backport for Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:15 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:15 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Deploy EditCheck (references) to hiwiki, bnwiki, idwiki (T366381)
14:13 kartik@deploy2002: Finished scap sync-world: Backport for Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420) (duration: 10m 08s)
14:09 kartik@deploy2002: kartik: Continuing with sync
14:06 kartik@deploy2002: kartik: Backport for Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3] (duration: 01m 44s)
14:03 kartik@deploy2002: Started scap sync-world: Backport for Enable Section Translation in ann, iba, nr and, tdd Wikipedias (T371420)
14:03 joal@deploy2002: Started deploy [airflow-dags/analytics@23bc4ad]: Regular analytics weekly train [airflow-dags/analytics@23bc4ad3]
13:52 cwhite: running thanos bucket cleanup on titan1001 - T351927
13:37 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1048
13:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1048
13:35 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1047
13:34 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047
13:23 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640] (duration: 03m 44s)
13:20 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@4bec0640]
13:13 joal@deploy2002: Finished deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640] (duration: 05m 03s)
13:08 joal@deploy2002: Started deploy [analytics/refinery@4bec064] (thin): Regular analytics weekly train THIN [analytics/refinery@4bec0640]
12:53 joal@deploy2002: Finished deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640] (duration: 16m 47s)
12:40 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047
12:40 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047
12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti1047
12:37 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1047
12:36 joal@deploy2002: Started deploy [analytics/refinery@4bec064]: Regular analytics weekly train [analytics/refinery@4bec0640]
12:16 vgutierrez: repool liberica on lvs1013
11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
11:44 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
11:27 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: sync
11:26 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: sync
11:26 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: sync
11:25 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: sync
11:24 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: sync
11:24 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: sync
11:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
11:19 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
11:18 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
11:17 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
11:16 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
11:11 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
11:10 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
11:09 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
11:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
11:03 vgutierrez: depool liberica on lvs1013
11:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
10:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:55 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
10:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2082.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
10:41 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye
10:41 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
10:40 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin2002"
10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
10:40 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
10:33 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
10:20 gmodena@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-dump-rev-content-reconcile-enrich: apply
10:18 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
10:07 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002"
09:58 oblivian@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002
09:57 oblivian@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add rw interface (still disabled), search - oblivian@cumin2002
09:57 oblivian@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add rw interface (still disabled), search - oblivian@cumin2002"
09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70981 and previous config saved to /var/cache/conftool/dbconfig/20241107-095205-arnaudb.json
09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
09:41 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2081.codfw.wmnet with OS bullseye
09:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70980 and previous config saved to /var/cache/conftool/dbconfig/20241107-093657-arnaudb.json
09:29 vgutierrez: upload liberica 0.4 to apt.wm.o (bookworm-wikimedia)
09:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P70979 and previous config saved to /var/cache/conftool/dbconfig/20241107-092150-arnaudb.json
09:21 moritzm: installing openjdk-8 security updates
09:21 moritzm: uploaded openjdk-8 8u412-ga-1~deb11u1 to apt.wikimedia.org for bookworm-wikimedia
09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.2 refs T375661
09:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70978 and previous config saved to /var/cache/conftool/dbconfig/20241107-090643-arnaudb.json
08:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
08:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:27 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2081.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:26 kartik@deploy2002: Finished scap sync-world: Backport for Translate: Enable message bundle Scribunto module on testwiki (T359918) (duration: 18m 39s)
08:25 _joe_: runing scap pull on mwdebug2001/2002
08:19 kartik@deploy2002: kartik, abi: Continuing with sync
08:13 kartik@deploy2002: kartik, abi: Backport for Translate: Enable message bundle Scribunto module on testwiki (T359918) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:07 kartik@deploy2002: Started scap sync-world: Backport for Translate: Enable message bundle Scribunto module on testwiki (T359918)
08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367781)', diff saved to https://phabricator.wikimedia.org/P70977 and previous config saved to /var/cache/conftool/dbconfig/20241107-080618-arnaudb.json
08:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
08:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
08:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2155.codfw.wmnet with reason: Maintenance
07:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
07:50 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
07:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
07:28 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C
07:27 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1046.eqiad.wmnet to cluster eqiad and group C
07:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C
07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group C
07:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B
07:25 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1045.eqiad.wmnet to cluster eqiad and group B
07:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
07:03 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
06:55 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
06:47 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
06:44 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:39 kartik@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply

2024-11-06

23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2152.codfw.wmnet with OS bookworm
23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1006.eqiad.wmnet with OS bookworm
23:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2151.codfw.wmnet with OS bookworm
23:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:37 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2154.codfw.wmnet with OS bookworm
23:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1005.eqiad.wmnet with OS bookworm
23:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:30 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2153.codfw.wmnet with OS bookworm
23:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
23:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1004.eqiad.wmnet with OS bookworm
23:23 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:23 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2155.codfw.wmnet with OS bookworm
23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage
23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage
23:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage
23:02 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1005.eqiad.wmnet with reason: host reimage
23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1004.eqiad.wmnet with reason: host reimage
23:00 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1006.eqiad.wmnet with reason: host reimage
22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2153.codfw.wmnet with reason: host reimage
22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2152.codfw.wmnet with reason: host reimage
22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2151.codfw.wmnet with reason: host reimage
22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2154.codfw.wmnet with reason: host reimage
22:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2155.codfw.wmnet with reason: host reimage
22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1004.eqiad.wmnet with OS bookworm
22:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1005.eqiad.wmnet with OS bookworm
22:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1006.eqiad.wmnet with OS bookworm
22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2155.codfw.wmnet with OS bookworm
22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2154.codfw.wmnet with OS bookworm
22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2153.codfw.wmnet with OS bookworm
22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2152.codfw.wmnet with OS bookworm
22:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2151.codfw.wmnet with OS bookworm
22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2155']
22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2154']
22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2153']
22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2152']
22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2151']
22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2151']
22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2152']
22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2153']
22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2154']
22:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2155']
22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2155.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2154.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2153.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2152.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2151.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002"
22:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2151-55 to codfw - jhancock@cumin2002"
22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox
22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host mc-gp1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002"
22:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for mc-gp1004 - jclark@cumin1002"
22:10 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2150.codfw.wmnet with OS bookworm
21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2148.codfw.wmnet with OS bookworm
21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2147.codfw.wmnet with OS bookworm
21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2146.codfw.wmnet with OS bookworm
21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2149.codfw.wmnet with OS bookworm
21:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:20 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
21:12 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2031.codfw.wmnet [reason: PSU replaced]
21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
21:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2150.codfw.wmnet with reason: host reimage
20:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2148.codfw.wmnet with reason: host reimage
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2147.codfw.wmnet with reason: host reimage
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2146.codfw.wmnet with reason: host reimage
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2149.codfw.wmnet with reason: host reimage
20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2148.codfw.wmnet with OS bookworm
20:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2150.codfw.wmnet with OS bookworm
20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2149.codfw.wmnet with OS bookworm
20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2147.codfw.wmnet with OS bookworm
20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2146.codfw.wmnet with OS bookworm
20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2150']
20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2149']
20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2148']
20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2147']
20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2146']
20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2150']
20:39 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2149']
20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2148']
20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2147']
20:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2146']
20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2150.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2149.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2148.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2147.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2146.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002"
20:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2146-50 to codfw - jhancock@cumin2002"
20:19 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2006.codfw.wmnet with OS bookworm
19:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:41 brett: Remove RSA cert support from P:idp clients (icinga, karma, klaxon, librenms, orchestrator) (T375569)
18:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2083.codfw.wmnet with OS bullseye
18:10 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
18:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:03 sukhe: dummy authdns-update to test CR 10857508
17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage
17:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2006.codfw.wmnet with reason: host reimage
17:35 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm
17:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:17 hnowlan: importing debs for mercurius-1.0.1
17:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
17:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2083.codfw.wmnet with reason: host reimage
17:11 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:11 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002"
17:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransw1001 - vriley@cumin1002"
17:05 vriley@cumin1002: START - Cookbook sre.dns.netbox
16:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
16:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:32 moritzm: remove ganeti1014 from active ganeti nodes T378921
16:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
16:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:26 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:25 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
16:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002"
16:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for fransc1001 - jclark@cumin1002"
16:17 jclark@cumin1002: START - Cookbook sre.dns.netbox
16:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2136 gradually with 4 steps - cloned on db2236
16:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:08 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
15:59 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:58 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0 (duration: 01m 23s)
15:57 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:57 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002"
15:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt fransc1001 - vriley@cumin1002"
15:57 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@294093b]: remove section alignment image suggestions, now in section topics v1.0.0
15:55 topranks: rebooting lvs4010 to verify new IPv6 sysctl's for RA processing work T358260
15:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted
15:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:25:00 on cr[3-4]-ulsfo with reason: prevent bgp alerts firing while lvs4010 is rebooted
15:55 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
15:53 vriley@cumin1002: START - Cookbook sre.dns.netbox
15:51 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:50 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:48 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:43 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host fransc1001.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:31 moritzm: installing Linux 5.10.226 on bullseye hosts
15:24 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db2136 gradually with 4 steps - cloned on db2236
15:18 mutante: gitlab1004 - systemctl start wmf_auto_restart_ssh-gitlab (because it had failed with "Service ssh-gitlab not present or not running") but now it's just fine and exits with "No restart necessary" T379166
15:13 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
15:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Document available wbformatvalue options (T323778) (duration: 38m 45s)
15:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2136.codfw.wmnet onto db2236.codfw.wmnet
15:00 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
14:59 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Document available wbformatvalue options (T323778) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:51 moritzm: installing php7.4 security updates
14:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
14:48 moritzm: installing usb.ids updates from Bookworm point release
14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
14:42 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1046
14:36 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1046
14:33 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Document available wbformatvalue options (T323778)
14:31 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Cleanup for logo related file (duration: 15m 01s)
14:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
14:31 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
14:27 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Continuing with sync
14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
14:20 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp2031.codfw.wmnet
14:19 sukhe: depool cp2031
14:19 lucaswerkmeister-wmde@deploy2002: hamishz, lucaswerkmeister-wmde: Backport for Cleanup for logo related file synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Cleanup for logo related file
14:16 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1045
14:14 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1045
14:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
14:02 vgutierrez@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad for service: ncredir-addrs [reason: no reason specified, T378453]
13:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
13:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
13:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
13:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain
13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain
13:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
13:27 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1041.eqiad.wmnet
13:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
13:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd
13:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2136.codfw.wmnet onto db2236.codfw.wmnet
12:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd
12:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain
12:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Cloning db2136 in db2236 for T373579', diff saved to https://phabricator.wikimedia.org/P70964 and previous config saved to /var/cache/conftool/dbconfig/20241106-125648-arnaudb.json
12:55 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain
12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2136 - depooling db2136 to clone on db2236
12:55 arnaudb@cumin1002: START - Cookbook sre.mysql.depool db2136 - depooling db2136 to clone on db2236
12:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
12:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
12:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: provisionning db2236.codfw.wmnet - T373579
12:52 slyngs: IDP/CAS-SSO Enable Redis TGT backend
12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
12:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
12:41 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
12:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1206 quickly with 2 steps - test 1087895
12:25 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - test 1087895
12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool to test cookbook hotfix on CR 1087895', diff saved to https://phabricator.wikimedia.org/P70960 and previous config saved to /var/cache/conftool/dbconfig/20241106-122348-arnaudb.json
12:23 marostegui: Migrate db1125 to MariaDB 10.6.20 T378940
12:23 arnaudb@cumin1002: dbctl commit (dc=all): '"db1206 pending"', diff saved to https://phabricator.wikimedia.org/P70959 and previous config saved to /var/cache/conftool/dbconfig/20241106-122318-arnaudb.json
12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
12:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
12:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
12:09 arnaudb@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1206 quickly with 2 steps - repool
12:09 arnaudb@cumin1002: START - Cookbook sre.mysql.pool db1206 quickly with 2 steps - repool
12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
12:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P70957 and previous config saved to /var/cache/conftool/dbconfig/20241106-120536-arnaudb.json
12:03 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
12:03 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
12:02 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
12:02 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:37 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:32 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:30 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
11:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
10:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
10:43 fabfur: rolling out haproxykafka on all ULSFO cp hosts (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087862) (T378578)
10:43 elukey: depool maps1005 to test an nginx config - T378944
10:41 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.2 refs T375661
10:32 XioNoX: push new pfw policies - T379127
10:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain
10:27 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain
10:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
10:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
09:59 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
09:59 jnuche@deploy2002: Finished scap sync-world: Backport for Fix automatic category creations by FuzzyBot (T285463) (duration: 08m 03s)
09:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
09:54 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1044.eqiad.wmnet to cluster eqiad and group B
09:54 jnuche@deploy2002: jnuche: Continuing with sync
09:54 jnuche@deploy2002: jnuche: Backport for Fix automatic category creations by FuzzyBot (T285463) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B
09:52 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1043.eqiad.wmnet to cluster eqiad and group B
09:51 jnuche@deploy2002: Started scap sync-world: Backport for Fix automatic category creations by FuzzyBot (T285463)
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
09:38 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
09:29 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1044
09:28 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1044
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1043
09:25 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1043
09:20 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2083.codfw.wmnet with OS bullseye
09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2083.codfw.wmnet with OS bullseye
08:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ms-be2083.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
08:12 volans: manually cleared /root/.ssh/known_hosts on the cumin hosts - T336485
05:52 kart_: Updated cxserver to 2024-10-25-044319-production (T377160, T375102, T371420)
05:38 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:38 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:37 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:36 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:34 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:33 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
01:30 zabe@deploy2002: Finished scap sync-world: T378260 (duration: 07m 34s)
01:23 zabe@deploy2002: Started scap sync-world: T378260
00:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es1021 gradually with 4 steps - Maint over
00:21 ryankemper: T377594 Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1087598; ran puppet on `snapshot101[0-7]*`. These dumps should be re-enabled now
00:02 ebernhardson@deploy2002: Finished scap sync-world: Backport for TextPassDumper: refresh content address on failure (T377594), TextPassDumper: refresh content address on failure (T377594) (duration: 08m 48s)

2024-11-05

23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es1021 gradually with 4 steps - Maint over
23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2134.codfw.wmnet with OS bookworm
23:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:57 ebernhardson@deploy2002: ebernhardson: Continuing with sync
23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2135.codfw.wmnet with OS bookworm
23:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:56 ebernhardson@deploy2002: ebernhardson: Backport for TextPassDumper: refresh content address on failure (T377594), TextPassDumper: refresh content address on failure (T377594) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2132.codfw.wmnet with OS bookworm
23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2130.codfw.wmnet with OS bookworm
23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2133.codfw.wmnet with OS bookworm
23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2131.codfw.wmnet with OS bookworm
23:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:53 ebernhardson@deploy2002: Started scap sync-world: Backport for TextPassDumper: refresh content address on failure (T377594), TextPassDumper: refresh content address on failure (T377594)
23:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage
23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage
23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage
23:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage
23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage
23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage
23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2135.codfw.wmnet with reason: host reimage
23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2134.codfw.wmnet with reason: host reimage
23:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2132.codfw.wmnet with reason: host reimage
23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2131.codfw.wmnet with reason: host reimage
23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2130.codfw.wmnet with reason: host reimage
23:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2133.codfw.wmnet with reason: host reimage
23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2135.codfw.wmnet with OS bookworm
23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2134.codfw.wmnet with OS bookworm
22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2133.codfw.wmnet with OS bookworm
22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2132.codfw.wmnet with OS bookworm
22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2131.codfw.wmnet with OS bookworm
22:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2130.codfw.wmnet with OS bookworm
22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2135']
22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2134']
22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2133']
22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2132']
22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2131']
22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-worker2130']
22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2135']
22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2134']
22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2133']
22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2132']
22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2131']
22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-worker2130']
22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2135.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2134.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2133.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2132.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2131.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2130.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2134
22:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2135
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2133
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2132
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2131
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2130
22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2135
22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2134
22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2133
22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132
22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2131
22:30 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2130
22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002"
22:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2130 to codfw - jhancock@cumin2002"
22:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2132
22:26 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:47 urbanecm@deploy2002: Finished scap sync-world: Backport for AbstractProvider: Normalize top level config correctly (T379094), AbstractProvider: Normalize top level config correctly (T379094) (duration: 12m 39s)
21:34 urbanecm@deploy2002: Started scap sync-world: Backport for AbstractProvider: Normalize top level config correctly (T379094), AbstractProvider: Normalize top level config correctly (T379094)
21:33 urbanecm@deploy2002: Finished scap sync-world: Backport for cswiki: adding throttle rule for Editathon Czechoslovakia (T379060) (duration: 31m 18s)
21:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
21:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
21:02 urbanecm@deploy2002: Started scap sync-world: Backport for cswiki: adding throttle rule for Editathon Czechoslovakia (T379060)
21:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
21:00 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet
20:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
20:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002"
20:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1b-eqiad - cmooney@cumin1002"
20:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox
20:07 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1b-eqiad.mgmt.eqiad.wmnet
20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:02 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002"
20:02 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for fasw2-c1a-eqiad - cmooney@cumin1002"
19:57 cmooney@cumin1002: START - Cookbook sre.dns.netbox
19:57 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
19:56 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
19:56 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
19:52 cmooney@cumin1002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
19:52 cmooney@cumin1002: START - Cookbook sre.network.provision for device fasw2-c1a-eqiad.mgmt.eqiad.wmnet
19:20 eileen: civicrm upgraded from 26d8013c to 65a8de90
18:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox
18:10 Amir1: gradual delete of thumbs in fawiki local images in both dcs
18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1021 (T376905)', diff saved to https://phabricator.wikimedia.org/P70948 and previous config saved to /var/cache/conftool/dbconfig/20241105-180013-ladsgroup.json
18:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
17:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
17:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T376905)', diff saved to https://phabricator.wikimedia.org/P70947 and previous config saved to /var/cache/conftool/dbconfig/20241105-175851-ladsgroup.json
17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70946 and previous config saved to /var/cache/conftool/dbconfig/20241105-174344-ladsgroup.json
17:42 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
17:41 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
17:41 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
17:41 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
17:39 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
17:39 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
17:36 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
17:36 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
17:34 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
17:34 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
17:33 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
17:33 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
17:32 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
17:32 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028', diff saved to https://phabricator.wikimedia.org/P70945 and previous config saved to /var/cache/conftool/dbconfig/20241105-172837-ladsgroup.json
17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1028 (T376905)', diff saved to https://phabricator.wikimedia.org/P70943 and previous config saved to /var/cache/conftool/dbconfig/20241105-171330-ladsgroup.json
17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1028 (T376905)', diff saved to https://phabricator.wikimedia.org/P70942 and previous config saved to /var/cache/conftool/dbconfig/20241105-170636-ladsgroup.json
17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1028.eqiad.wmnet with reason: Maintenance
17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 (T376905)', diff saved to https://phabricator.wikimedia.org/P70941 and previous config saved to /var/cache/conftool/dbconfig/20241105-170609-ladsgroup.json
16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70940 and previous config saved to /var/cache/conftool/dbconfig/20241105-165103-ladsgroup.json
16:37 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Fixup paths to moved resources (T379080) (duration: 08m 02s)
16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031', diff saved to https://phabricator.wikimedia.org/P70939 and previous config saved to /var/cache/conftool/dbconfig/20241105-163556-ladsgroup.json
16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
16:32 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Fixup paths to moved resources (T379080) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:32 cdanis@cumin1002: START - Cookbook sre.dns.netbox
16:29 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Fixup paths to moved resources (T379080)
16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1031 (T376905)', diff saved to https://phabricator.wikimedia.org/P70938 and previous config saved to /var/cache/conftool/dbconfig/20241105-162048-ladsgroup.json
16:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1031 (T376905)', diff saved to https://phabricator.wikimedia.org/P70937 and previous config saved to /var/cache/conftool/dbconfig/20241105-161455-ladsgroup.json
16:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance
16:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1031.eqiad.wmnet with reason: Maintenance
16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 (T376905)', diff saved to https://phabricator.wikimedia.org/P70936 and previous config saved to /var/cache/conftool/dbconfig/20241105-161340-ladsgroup.json
16:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
16:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70935 and previous config saved to /var/cache/conftool/dbconfig/20241105-155833-ladsgroup.json
15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
15:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet
15:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
15:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B
15:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1042.eqiad.wmnet to cluster eqiad and group B
15:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B
15:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1041.eqiad.wmnet to cluster eqiad and group B
15:48 moritzm: remove ganeti1013 from active ganeti nodes T378921
15:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
15:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033', diff saved to https://phabricator.wikimedia.org/P70934 and previous config saved to /var/cache/conftool/dbconfig/20241105-154326-ladsgroup.json
15:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
15:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
15:32 hashar: Switched PCC workers to Java 17 via https://horizon.wikimedia.org/project/prefixpuppet/?tab=prefix_puppet__puppet-pcc-worker # T359795
15:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1033 (T376905)', diff saved to https://phabricator.wikimedia.org/P70933 and previous config saved to /var/cache/conftool/dbconfig/20241105-152819-ladsgroup.json
15:27 hashar: Switched deployment-deploy04.deployment-prep.eqiad1.wikimedia.cloud to Java 17 # T359795
15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1033 (T376905)', diff saved to https://phabricator.wikimedia.org/P70932 and previous config saved to /var/cache/conftool/dbconfig/20241105-152139-ladsgroup.json
15:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
15:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1033.eqiad.wmnet with reason: Maintenance
15:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T376905)', diff saved to https://phabricator.wikimedia.org/P70931 and previous config saved to /var/cache/conftool/dbconfig/20241105-152114-ladsgroup.json
15:20 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
15:18 hashar: Switched WMCS integration instances from Java 11 to Java 17 via Horizon project wide config. That was forgotten in T359795 and blocks today Jenkins upgrade ( T379059 )
15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
15:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70929 and previous config saved to /var/cache/conftool/dbconfig/20241105-150607-ladsgroup.json
15:02 cdanis@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
15:02 cdanis@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
15:02 cdanis@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
15:01 cdanis@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
15:01 hashar: Upgrading CI Jenkins | T379059
14:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
14:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026', diff saved to https://phabricator.wikimedia.org/P70928 and previous config saved to /var/cache/conftool/dbconfig/20241105-145059-ladsgroup.json
14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
14:48 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs T375661
14:44 cdanis@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
14:44 cdanis@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
14:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1026 (T376905)', diff saved to https://phabricator.wikimedia.org/P70927 and previous config saved to /var/cache/conftool/dbconfig/20241105-143552-ladsgroup.json
14:34 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
away: UTC afternoon deploys done
14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1026 (T376905)', diff saved to https://phabricator.wikimedia.org/P70926 and previous config saved to /var/cache/conftool/dbconfig/20241105-142959-ladsgroup.json
14:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
14:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1026.eqiad.wmnet with reason: Maintenance
14:29 vgutierrez: upload liberica 0.3 to apt.wm.o (bookworm-wikimedia)
14:28 tgr@deploy2002: Finished scap sync-world: Backport for JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067) (duration: 17m 24s)
14:24 tgr@deploy2002: tgr: Continuing with sync
14:16 tgr@deploy2002: tgr: Backport for JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
14:11 tgr@deploy2002: Started scap sync-world: Backport for JsonConfig: Disable TrackGlobalJsonLinks to avoid missing table errors (T379067)
14:10 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
14:10 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
14:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
14:08 moritzm: installing PHP 7.4 security updates on bullseye (as packaged in Debian)
14:08 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
14:07 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
14:07 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
14:07 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
13:57 moritzm: installed libapache2-mod-auth-openidc bugfix updates from Bookworm point release
13:54 arnaudb: reimage pc1017 T378068
13:53 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
13:52 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
13:52 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
13:44 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
13:44 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:41 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
13:39 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
13:34 moritzm: imported jenkins 2.479.1 to thirdparty/ci for bullseye-wikimedia T379059
13:29 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
13:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
13:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on pc1017.eqiad.wmnet with reason: T378068, host is not pooled
13:10 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
13:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
13:10 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
13:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
13:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
13:08 moritzm: installing php7.4 security updates on remaining non-wikikube servers T378173
13:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
12:50 kharlan@deploy2002: Finished scap sync-world: Backport for Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336) (duration: 11m 46s)
12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
12:46 kharlan@deploy2002: kharlan: Continuing with sync
12:42 kharlan@deploy2002: kharlan: Backport for Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
12:39 kharlan@deploy2002: Started scap sync-world: Backport for Revert^2 "temp accounts: Enable temp account creation on second-round pilots" (T378336)
12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
12:35 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
12:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
12:34 fnegri@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=93)
12:34 fnegri@cumin1002: START - Cookbook sre.wikireplicas.update-views
12:33 urbanecm: eswiki,x1: `delete from growthexperiments_link_recommendations where gelr_page=10598298;` (to verify updates are flowing in; T378983)
12:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
12:33 urbanecm: mwmaint2002: kill all instances of refreshLinkRecommendation (T378983)
12:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
12:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
12:23 urbanecm@deploy2002: Finished scap sync-world: Backport for CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150) (duration: 07m 39s)
12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
12:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1125.eqiad.wmnet with reason: testing
12:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
12:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2230.codfw.wmnet with reason: testing
12:16 urbanecm@deploy2002: Started scap sync-world: Backport for CirrusSearch: Disable updating weighted tags via EventBus (T378983 T377150)
12:10 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661 (duration: 07m 43s)
12:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B
12:02 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1040.eqiad.wmnet to cluster eqiad and group B
12:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661
12:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
11:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
11:53 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1042
11:53 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.2 refs T375661
11:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 (T376905)', diff saved to https://phabricator.wikimedia.org/P70922 and previous config saved to /var/cache/conftool/dbconfig/20241105-115301-ladsgroup.json
11:52 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1042
11:49 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1041
11:47 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1041
11:47 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti1040
11:46 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti1040
11:39 jnuche@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661 (duration: 36m 28s)
11:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70921 and previous config saved to /var/cache/conftool/dbconfig/20241105-113754-ladsgroup.json
11:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029', diff saved to https://phabricator.wikimedia.org/P70920 and previous config saved to /var/cache/conftool/dbconfig/20241105-112246-ladsgroup.json
11:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1029 (T376905)', diff saved to https://phabricator.wikimedia.org/P70919 and previous config saved to /var/cache/conftool/dbconfig/20241105-110739-ladsgroup.json
11:02 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661
11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1029 (T376905)', diff saved to https://phabricator.wikimedia.org/P70918 and previous config saved to /var/cache/conftool/dbconfig/20241105-110139-ladsgroup.json
11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance
11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1029.eqiad.wmnet with reason: Maintenance
11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 (T376905)', diff saved to https://phabricator.wikimedia.org/P70917 and previous config saved to /var/cache/conftool/dbconfig/20241105-110115-ladsgroup.json
10:46 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts
10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70916 and previous config saved to /var/cache/conftool/dbconfig/20241105-104608-ladsgroup.json
10:44 jnuche@deploy2002: install-world aborted: (no justification provided) (duration: 03m 09s)
10:41 jnuche@deploy2002: Installing scap version "4.121.0" for 209 hosts
10:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032', diff saved to https://phabricator.wikimedia.org/P70915 and previous config saved to /var/cache/conftool/dbconfig/20241105-103101-ladsgroup.json
10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance es1032 (T376905)', diff saved to https://phabricator.wikimedia.org/P70914 and previous config saved to /var/cache/conftool/dbconfig/20241105-101553-ladsgroup.json
10:11 elukey: set proxy timeouts of docker registry's nginx instances from 300s to 180s - T378618
10:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling es1032 (T376905)', diff saved to https://phabricator.wikimedia.org/P70913 and previous config saved to /var/cache/conftool/dbconfig/20241105-100953-ladsgroup.json
10:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
10:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1032.eqiad.wmnet with reason: Maintenance
10:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm
10:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
10:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
09:49 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
09:33 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
09:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
09:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on pc1013.eqiad.wmnet with reason: T373037, host is not pooled
09:22 jnuche@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661
09:21 _joe_: restarted rsyslog on deploy2002 T379044
08:57 tchanders@deploy2002: Started scap sync-world: Backport for Revert "temp accounts: Enable temp account creation on second-round pilots"
08:24 vgutierrez: uploaded ipip-multiqueue-optimizer 0.3+deb12u1 to apt.wm.o (bookworm)
08:10 tchanders@deploy2002: Started scap sync-world: Backport for temp accounts: Enable temp account creation on second-round pilots (T378336)
08:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2828
08:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 2828
08:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593
07:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 14593
07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11414
07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 11414
05:10 mwpresync@deploy2002: Pruned MediaWiki: 1.43.0-wmf.27 (duration: 10m 37s)
04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.2 refs T375661
00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
00:10 rzl@deploy2002: Finished scap sync-world: 1085506 (duration: 02m 50s)
00:08 rzl@deploy2002: Started scap sync-world: 1085506
00:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED

2024-11-04

23:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mc-gp2006
23:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mc-gp2006
23:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mc-gp2006.codfw.wmnet with OS bookworm
23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2005.codfw.wmnet with OS bookworm
23:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2004.codfw.wmnet with OS bookworm
23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage
22:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage
22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2005.codfw.wmnet with reason: host reimage
22:53 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2004.codfw.wmnet with reason: host reimage
22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2006.codfw.wmnet with OS bookworm
22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2005.codfw.wmnet with OS bookworm
22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2004.codfw.wmnet with OS bookworm
22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2006']
22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2005']
22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc-gp2004']
22:33 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2006']
22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2005']
22:32 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc-gp2004']
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:22 damilare: civicrm upgraded from 31f5cbdb to 26d8013c
22:22 damilare: SmashPig upgraded from be47dddd to 601405dc
22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2006.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mc-gp2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002"
22:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding mc-gp2004 to codfw - jhancock@cumin2002"
22:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox
22:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS bookworm
22:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T376905)', diff saved to https://phabricator.wikimedia.org/P70912 and previous config saved to /var/cache/conftool/dbconfig/20241104-220026-ladsgroup.json
22:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS bookworm
21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70911 and previous config saved to /var/cache/conftool/dbconfig/20241104-214519-ladsgroup.json
away: UTC late deploys done
21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage
21:41 tgr@deploy2002: Finished scap sync-world: Backport for Set Flow to read-only on remaining phase 0 wikis (T377990) (duration: 08m 40s)
21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage
21:36 tgr@deploy2002: tgr, kemayo: Continuing with sync
21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage
21:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage
21:35 tgr@deploy2002: tgr, kemayo: Backport for Set Flow to read-only on remaining phase 0 wikis (T377990) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:32 tgr@deploy2002: Started scap sync-world: Backport for Set Flow to read-only on remaining phase 0 wikis (T377990)
21:31 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
21:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P70910 and previous config saved to /var/cache/conftool/dbconfig/20241104-213012-ladsgroup.json
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS bookworm
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS bookworm
21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2004']
21:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['kubestage2003']
21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2004']
21:15 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['kubestage2003']
21:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T376905)', diff saved to https://phabricator.wikimedia.org/P70909 and previous config saved to /var/cache/conftool/dbconfig/20241104-211505-ladsgroup.json
21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
21:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T376905)', diff saved to https://phabricator.wikimedia.org/P70908 and previous config saved to /var/cache/conftool/dbconfig/20241104-210800-ladsgroup.json
21:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
21:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
21:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host kubestage2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002"
21:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding kubestage2003 to codfw - jhancock@cumin2002"
21:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
21:02 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T376905)', diff saved to https://phabricator.wikimedia.org/P70907 and previous config saved to /var/cache/conftool/dbconfig/20241104-210224-ladsgroup.json
20:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
20:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*: Apply openjdk upgrade (11.0.25+9-1~deb11u1) - eevans@cumin1002
20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70906 and previous config saved to /var/cache/conftool/dbconfig/20241104-204717-ladsgroup.json
20:35 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1013.eqiad.wmnet
20:35 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:35 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
20:32 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: aqs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P70905 and previous config saved to /var/cache/conftool/dbconfig/20241104-203210-ladsgroup.json
20:27 eevans@cumin1002: START - Cookbook sre.dns.netbox
20:26 swfrench-wmf: zero-replica "migration" releases created for all shellbox instances - T375243
20:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
20:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
20:22 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
20:22 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
20:21 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts aqs1013.eqiad.wmnet
20:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
20:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
20:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox: apply
20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T376905)', diff saved to https://phabricator.wikimedia.org/P70904 and previous config saved to /var/cache/conftool/dbconfig/20241104-201703-ladsgroup.json
20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T376905)', diff saved to https://phabricator.wikimedia.org/P70903 and previous config saved to /var/cache/conftool/dbconfig/20241104-200905-ladsgroup.json
20:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
20:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
20:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70902 and previous config saved to /var/cache/conftool/dbconfig/20241104-200840-ladsgroup.json
20:00 urbanecm@deploy2002: Finished scap sync-world: Backport for Message: Downgrade exception on bool/null param to warning (T378876) (duration: 09m 12s)
19:55 urbanecm@deploy2002: urbanecm: Continuing with sync
19:54 urbanecm@deploy2002: urbanecm: Backport for Message: Downgrade exception on bool/null param to warning (T378876) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70901 and previous config saved to /var/cache/conftool/dbconfig/20241104-195333-ladsgroup.json
19:51 urbanecm@deploy2002: Started scap sync-world: Backport for Message: Downgrade exception on bool/null param to warning (T378876)
19:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P70900 and previous config saved to /var/cache/conftool/dbconfig/20241104-193826-ladsgroup.json
19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70899 and previous config saved to /var/cache/conftool/dbconfig/20241104-192319-ladsgroup.json
19:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
19:22 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
19:22 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
19:21 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
19:21 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
19:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
19:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
19:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
19:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
19:17 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T376905)', diff saved to https://phabricator.wikimedia.org/P70898 and previous config saved to /var/cache/conftool/dbconfig/20241104-191519-ladsgroup.json
19:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
19:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
19:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T376905)', diff saved to https://phabricator.wikimedia.org/P70897 and previous config saved to /var/cache/conftool/dbconfig/20241104-191454-ladsgroup.json
19:09 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
19:09 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
19:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
19:03 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
18:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70896 and previous config saved to /var/cache/conftool/dbconfig/20241104-185947-ladsgroup.json
18:58 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
18:57 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
18:57 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
18:56 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
18:56 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
18:55 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
18:55 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
18:54 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
18:54 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
18:53 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
18:47 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer
18:47 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs1013.eqiad.wmnet with reason: known issues with liberica-hcforwarder and ipip-multiqueue-optimizer
18:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P70895 and previous config saved to /var/cache/conftool/dbconfig/20241104-184440-ladsgroup.json
18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet
18:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet
18:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez
18:41 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: vgutierrez
18:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T376905)', diff saved to https://phabricator.wikimedia.org/P70894 and previous config saved to /var/cache/conftool/dbconfig/20241104-182933-ladsgroup.json
18:25 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm
18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T376905)', diff saved to https://phabricator.wikimedia.org/P70893 and previous config saved to /var/cache/conftool/dbconfig/20241104-182140-ladsgroup.json
18:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
18:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T376905)', diff saved to https://phabricator.wikimedia.org/P70892 and previous config saved to /var/cache/conftool/dbconfig/20241104-182125-ladsgroup.json
18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70891 and previous config saved to /var/cache/conftool/dbconfig/20241104-180618-ladsgroup.json
18:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
17:56 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
17:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P70890 and previous config saved to /var/cache/conftool/dbconfig/20241104-175111-ladsgroup.json
17:43 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
17:43 vgutierrez: upload liberica 0.2 to apt.wm.o (bookworm) - T377127
17:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T376905)', diff saved to https://phabricator.wikimedia.org/P70889 and previous config saved to /var/cache/conftool/dbconfig/20241104-173604-ladsgroup.json
17:35 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.puppet.migrate-host (exit_code=99) for host lvs1013.eqiad.wmnet
17:35 vgutierrez@cumin1002: START - Cookbook sre.puppet.migrate-host for host lvs1013.eqiad.wmnet
17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T376905)', diff saved to https://phabricator.wikimedia.org/P70888 and previous config saved to /var/cache/conftool/dbconfig/20241104-172638-ladsgroup.json
17:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
17:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70887 and previous config saved to /var/cache/conftool/dbconfig/20241104-172612-ladsgroup.json
17:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
17:20 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70886 and previous config saved to /var/cache/conftool/dbconfig/20241104-171105-ladsgroup.json
17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
17:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
16:59 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bookworm
16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P70885 and previous config saved to /var/cache/conftool/dbconfig/20241104-165558-ladsgroup.json
16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70883 and previous config saved to /var/cache/conftool/dbconfig/20241104-164051-ladsgroup.json
16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T376905)', diff saved to https://phabricator.wikimedia.org/P70882 and previous config saved to /var/cache/conftool/dbconfig/20241104-163129-ladsgroup.json
16:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70881 and previous config saved to /var/cache/conftool/dbconfig/20241104-163104-ladsgroup.json
16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
16:21 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
16:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70880 and previous config saved to /var/cache/conftool/dbconfig/20241104-161557-ladsgroup.json
16:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2135.codfw.wmnet onto db2235.codfw.wmnet
16:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
16:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235
16:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2160.codfw.wmnet with reason: cloning db2135@db2235
16:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
16:02 arnaudb@cumin1002: START - Cookbook sre.mysql.clone of db2135.codfw.wmnet onto db2235.codfw.wmnet
16:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P70879 and previous config saved to /var/cache/conftool/dbconfig/20241104-160050-ladsgroup.json
16:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235
16:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db[2135,2235].codfw.wmnet with reason: cloning db2135@db2235
15:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:54 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
15:51 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70878 and previous config saved to /var/cache/conftool/dbconfig/20241104-154543-ladsgroup.json
15:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
15:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T376905)', diff saved to https://phabricator.wikimedia.org/P70877 and previous config saved to /var/cache/conftool/dbconfig/20241104-153613-ladsgroup.json
15:36 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
15:35 vgutierrez: upload liberica 0.1 to apt.wm.o (bookworm) - T377127
15:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T376905)', diff saved to https://phabricator.wikimedia.org/P70876 and previous config saved to /var/cache/conftool/dbconfig/20241104-153548-ladsgroup.json
15:29 sukhe: running authdns-update to move CN traffic to eqsin from ulsfo: T378744
15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70874 and previous config saved to /var/cache/conftool/dbconfig/20241104-152041-ladsgroup.json
15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P70873 and previous config saved to /var/cache/conftool/dbconfig/20241104-150534-ladsgroup.json
14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T376905)', diff saved to https://phabricator.wikimedia.org/P70872 and previous config saved to /var/cache/conftool/dbconfig/20241104-145027-ladsgroup.json
14:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T376905)', diff saved to https://phabricator.wikimedia.org/P70871 and previous config saved to /var/cache/conftool/dbconfig/20241104-144101-ladsgroup.json
14:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
14:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
14:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70870 and previous config saved to /var/cache/conftool/dbconfig/20241104-144037-ladsgroup.json
14:38 Lucas_WMDE: UTC afternoon backport+config window done
14:36 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252) (duration: 23m 39s)
14:28 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Continuing with sync
14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70869 and previous config saved to /var/cache/conftool/dbconfig/20241104-142530-ladsgroup.json
14:24 moritzm: uploaded php7.4 7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u3 to component/icu67 (backports of latest security fixes to our PHP 7.4 build)
14:23 lucaswerkmeister-wmde@deploy2002: mhorsey, lucaswerkmeister-wmde: Backport for Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:12 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Exclude affiliates from P&E dashboard integration for CampaignEvents Extension (T377252)
14:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P70868 and previous config saved to /var/cache/conftool/dbconfig/20241104-141023-ladsgroup.json
13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70867 and previous config saved to /var/cache/conftool/dbconfig/20241104-135516-ladsgroup.json
13:51 marostegui: Start schema change on redacteddb1001:s8 T367856 (this will make replication in s8 lag for around 2-3 days)
13:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change T367856
13:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Schema change T367856
13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T376905)', diff saved to https://phabricator.wikimedia.org/P70866 and previous config saved to /var/cache/conftool/dbconfig/20241104-134605-ladsgroup.json
13:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
13:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
13:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
13:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70865 and previous config saved to /var/cache/conftool/dbconfig/20241104-134021-ladsgroup.json
13:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B
13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70864 and previous config saved to /var/cache/conftool/dbconfig/20241104-132513-ladsgroup.json
13:24 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B
13:11 Dreamy_Jazz: Started slow MediaModeration scan for commonswiki to be scanning as close to upload as possible - https://wikitech.wikimedia.org/wiki/MediaModeration
13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P70862 and previous config saved to /var/cache/conftool/dbconfig/20241104-131006-ladsgroup.json
13:06 Dreamy_Jazz: Started MediaModeration scan on all wikis other than s4 (commonswiki + testcommonswiki) - https://wikitech.wikimedia.org/wiki/MediaModeration
12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70861 and previous config saved to /var/cache/conftool/dbconfig/20241104-125459-ladsgroup.json
12:49 XioNoX: deploy "Add temporary LVS community for liberica test" - T378453
12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70860 and previous config saved to /var/cache/conftool/dbconfig/20241104-124533-ladsgroup.json
12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
12:34 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
12:24 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
12:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
12:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
12:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
12:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B
12:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1039.eqiad.wmnet to cluster eqiad and group B
12:10 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
12:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
12:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
12:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
11:58 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70859 and previous config saved to /var/cache/conftool/dbconfig/20241104-115514-ladsgroup.json
11:45 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:44 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70858 and previous config saved to /var/cache/conftool/dbconfig/20241104-114008-ladsgroup.json
11:34 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P70857 and previous config saved to /var/cache/conftool/dbconfig/20241104-112501-ladsgroup.json
11:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70856 and previous config saved to /var/cache/conftool/dbconfig/20241104-110953-ladsgroup.json
11:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T376905)', diff saved to https://phabricator.wikimedia.org/P70855 and previous config saved to /var/cache/conftool/dbconfig/20241104-110141-ladsgroup.json
11:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance
11:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2227.codfw.wmnet with reason: Maintenance
11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T376905)', diff saved to https://phabricator.wikimedia.org/P70854 and previous config saved to /var/cache/conftool/dbconfig/20241104-110113-ladsgroup.json
10:54 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:48 XioNoX: eqiad: Prefer Lumen to reach ATT - T377844
10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70853 and previous config saved to /var/cache/conftool/dbconfig/20241104-104606-ladsgroup.json
10:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:41 moritzm: installing libtool updates from Bookworm point release
10:31 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:31 moritzm: installing libseccomp updates from Bookworm point release
10:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P70852 and previous config saved to /var/cache/conftool/dbconfig/20241104-103059-ladsgroup.json
10:20 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T376905)', diff saved to https://phabricator.wikimedia.org/P70851 and previous config saved to /var/cache/conftool/dbconfig/20241104-101552-ladsgroup.json
10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T376905)', diff saved to https://phabricator.wikimedia.org/P70850 and previous config saved to /var/cache/conftool/dbconfig/20241104-100813-ladsgroup.json
10:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
10:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
10:06 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
10:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
09:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
09:56 volans: deploying spicerack v8.15.2 to cumin[12]002
09:55 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
09:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
09:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
09:37 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: reboots for nftables
09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 13 hosts with reason: reboots for nftables
09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables
09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on ganeti1045.eqiad.wmnet with reason: reboots for nftables
09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:51 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:50 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2014.codfw.wmnet
08:23 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:23 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:22 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2014.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization T373579
08:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2239.codfw.wmnet with reason: waiting for productionnization T373579
08:16 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:15 XioNoX: push Drop labtestwikitech return traffic term to eqiad routers - CR1083589
08:12 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2014.codfw.wmnet
08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti2013.codfw.wmnet
08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti2013.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
08:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox
07:59 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti2013.codfw.wmnet

2024-11-02

15:48 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386) (duration: 12m 09s)
15:44 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Continuing with sync
15:38 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, ladsgroup: Backport for Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:36 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove 'mainpage' from $wgForceUIMsgAsContentMsg for Wikidata (T184386)
15:26 reedy@deploy2002: Finished scap sync-world: use statemnts (duration: 07m 13s)
15:19 reedy@deploy2002: Started scap sync-world: use statemnts
15:13 reedy@deploy2002: Synchronized wmf-config/: Comment updates (duration: 07m 31s)

2024-11-01

20:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1016.eqiad.wmnet with OS bullseye
19:47 inflatador: bking@an-presto[1016:1020].eqiad.wmnet temporarily install perccli to check disk status without requiring reboot T374924
19:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage
19:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1016.eqiad.wmnet with reason: host reimage
19:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1016.eqiad.wmnet with OS bullseye
19:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet']
19:07 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1016.eqiad.wmnet']
19:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet']
18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1016.eqiad.wmnet']
18:56 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1017.eqiad.wmnet']
18:56 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1017.eqiad.wmnet']
18:51 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1052.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:44 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1051.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1050.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1049.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:39 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1048.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:35 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1046.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1047.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:33 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1045.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1043.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1042.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1041.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1040.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:19 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1039.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
18:11 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1018.eqiad.wmnet']
18:10 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1018.eqiad.wmnet']
18:09 bking@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
18:07 dancy@deploy2002: Installation of scap version "4.120.0" completed for 1 hosts
18:07 bking@cumin2002: START - Cookbook sre.puppet.renew-cert for an-presto1020.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
18:06 dancy@deploy2002: Installing scap version "4.120.0" for 1 hosts
18:04 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye
17:00 Dreamy_Jazz: Ran `/usr/local/bin/foreachwikiindblist /srv/mediawiki/dblists/all.dblist extensions/WikimediaEvents/maintenance/UpdatePeriodicMetrics.php --verbose`
16:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage
16:33 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1020.eqiad.wmnet with reason: host reimage
16:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye
16:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill
16:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on thanos-be2003.codfw.wmnet with reason: give it time for sde1 fs to backfill
16:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production
16:16 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 16:00:00 on db2239.codfw.wmnet with reason: not yet in production
16:05 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-presto1020.eqiad.wmnet']
16:05 thcipriani@deploy2002: Finished scap sync-world: Backport for Revert "Dummy commit for testing" (duration: 07m 46s)
16:00 thcipriani@deploy2002: thcipriani: Continuing with sync
16:00 thcipriani@deploy2002: thcipriani: Backport for Revert "Dummy commit for testing" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:57 thcipriani@deploy2002: Started scap sync-world: Backport for Revert "Dummy commit for testing"
15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1020.eqiad.wmnet']
15:55 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye
15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
14:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye
14:40 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-presto1020.eqiad.wmnet with OS bullseye
14:29 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bullseye
14:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-presto1020.eqiad.wmnet with OS bookworm
14:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2190 gradually with 4 steps - Maint over
13:55 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1020.eqiad.wmnet with OS bookworm
13:43 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:43 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:38 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:20 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db2190 gradually with 4 steps - Maint over
12:43 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
12:43 cmooney@cumin1002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1025.eqiad.wmnet
12:43 cmooney@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
12:42 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
12:28 cmooney@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
12:28 topranks: rebooting ganeti1025 as VMs are unresponsive and will not shutdown or move
10:38 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
off: sudo cumin -b4 "A:cp and A:magru" "run-puppet-agent" to pick up CR 1085569
02:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
02:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70840 and previous config saved to /var/cache/conftool/dbconfig/20241101-022447-ladsgroup.json
02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70839 and previous config saved to /var/cache/conftool/dbconfig/20241101-020940-ladsgroup.json
01:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1019.eqiad.wmnet with OS bullseye
01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P70838 and previous config saved to /var/cache/conftool/dbconfig/20241101-015433-ladsgroup.json
01:42 urandom: Decommissioning Cassandra/aqs1013-{a,b} — T378725
01:41 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — T378725
01:40 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on aqs1013.eqiad.wmnet with reason: Decommissioning — T378725
01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70837 and previous config saved to /var/cache/conftool/dbconfig/20241101-013926-ladsgroup.json
01:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1022.eqiad.wmnet
01:39 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1022.eqiad.wmnet
01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T376905)', diff saved to https://phabricator.wikimedia.org/P70836 and previous config saved to /var/cache/conftool/dbconfig/20241101-013102-ladsgroup.json
01:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
01:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
01:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T376905)', diff saved to https://phabricator.wikimedia.org/P70835 and previous config saved to /var/cache/conftool/dbconfig/20241101-013035-ladsgroup.json
01:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage
01:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1019.eqiad.wmnet with reason: host reimage
01:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70834 and previous config saved to /var/cache/conftool/dbconfig/20241101-011528-ladsgroup.json
01:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host an-presto1019.eqiad.wmnet with OS bullseye
01:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P70833 and previous config saved to /var/cache/conftool/dbconfig/20241101-010021-ladsgroup.json
00:54 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
00:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-presto1019.eqiad.wmnet']
00:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T376905)', diff saved to https://phabricator.wikimedia.org/P70832 and previous config saved to /var/cache/conftool/dbconfig/20241101-004514-ladsgroup.json
00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T376905)', diff saved to https://phabricator.wikimedia.org/P70831 and previous config saved to /var/cache/conftool/dbconfig/20241101-003546-ladsgroup.json
00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
00:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T376905)', diff saved to https://phabricator.wikimedia.org/P70830 and previous config saved to /var/cache/conftool/dbconfig/20241101-003520-ladsgroup.json
00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70829 and previous config saved to /var/cache/conftool/dbconfig/20241101-002013-ladsgroup.json
00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P70828 and previous config saved to /var/cache/conftool/dbconfig/20241101-000506-ladsgroup.json

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s