Server Admin Log
Appearance
2025-12-07
- 11:51 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 11:51 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 02:51 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 02:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T410589)', diff saved to https://phabricator.wikimedia.org/P86442 and previous config saved to /var/cache/conftool/dbconfig/20251207-025120-ladsgroup.json
- 02:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P86441 and previous config saved to /var/cache/conftool/dbconfig/20251207-023613-ladsgroup.json
- 02:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P86440 and previous config saved to /var/cache/conftool/dbconfig/20251207-022105-ladsgroup.json
- 02:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T410589)', diff saved to https://phabricator.wikimedia.org/P86439 and previous config saved to /var/cache/conftool/dbconfig/20251207-020558-ladsgroup.json
- 01:18 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 17m 48s)
- 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2025-12-06
- 14:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1212 (T410589)', diff saved to https://phabricator.wikimedia.org/P86436 and previous config saved to /var/cache/conftool/dbconfig/20251206-144719-ladsgroup.json
- 14:47 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 03:47 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 03:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T410589)', diff saved to https://phabricator.wikimedia.org/P86435 and previous config saved to /var/cache/conftool/dbconfig/20251206-034700-ladsgroup.json
- 03:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P86434 and previous config saved to /var/cache/conftool/dbconfig/20251206-033152-ladsgroup.json
- 03:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P86433 and previous config saved to /var/cache/conftool/dbconfig/20251206-031644-ladsgroup.json
- 03:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T410589)', diff saved to https://phabricator.wikimedia.org/P86432 and previous config saved to /var/cache/conftool/dbconfig/20251206-030136-ladsgroup.json
- 01:18 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 17m 22s)
- 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2025-12-05
- 22:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 22:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 22:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 22:33 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 22:32 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 22:31 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 22:29 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 22:29 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 22:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 22:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 22:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 22:10 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 21:57 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 21:56 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 21:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 21:38 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 21:19 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 21:18 xcollazo@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 21:03 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 21:03 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 20:17 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 20:16 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 20:06 ejegg: donorwiki upgraded from 9ab44e85 to bbd96c00
- 19:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 19:49 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 19:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 19:09 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 18:28 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 18:27 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 18:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 18:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 18:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 18:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 17:23 topranks: add updated ssh firewall filter config to pfw1-eqiad.wikimedia.org T390939
- 17:11 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 17:10 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 17:10 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 17:10 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 17:07 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 17:02 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 17:02 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-lab1001.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 16:03 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
- 16:03 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
- 15:30 Amir1: creating ores tables on thwiki (T409438)
- 15:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1189 (T410589)', diff saved to https://phabricator.wikimedia.org/P86429 and previous config saved to /var/cache/conftool/dbconfig/20251205-150737-ladsgroup.json
- 15:07 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 15:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T410589)', diff saved to https://phabricator.wikimedia.org/P86428 and previous config saved to /var/cache/conftool/dbconfig/20251205-150713-ladsgroup.json
- 14:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 14:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 14:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 14:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P86427 and previous config saved to /var/cache/conftool/dbconfig/20251205-145206-ladsgroup.json
- 14:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 14:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 14:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 14:46 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
- 14:45 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
- 14:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P86426 and previous config saved to /var/cache/conftool/dbconfig/20251205-143658-ladsgroup.json
- 14:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 14:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
- 14:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
- 14:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook: apply
- 14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
- 14:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T410589)', diff saved to https://phabricator.wikimedia.org/P86425 and previous config saved to /var/cache/conftool/dbconfig/20251205-142150-ladsgroup.json
- 14:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 14:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 14:08 jayme: stopped puppet on wikikube-ctrl2* and restarted kube-apiserver to temporarily extend audit logging
- 13:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 13:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 13:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 13:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 13:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook-next: apply
- 13:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook-next: apply
- 13:47 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook-next: apply
- 13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook-next: apply
- 13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook-next: apply
- 13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook-next: apply
- 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
- 13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook: apply
- 13:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook: apply
- 13:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:30 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:10 moritzm: upload python3-sshpubkeys to 3.3.1-1~wmf12u1 to apt.wikimedia.org T411816
- 12:42 moritzm: upgrade python3-sshpubkeys on idm-test1001 to 3.3.1-1~wmf12u1 T411816
- 12:30 jayme: removed helm release mw-script/utk6lsuw in k8s@codfw which was in stuck in pending-install state since 9+ days
- 11:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 11:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 11:42 lucaswerkmeister-wmde@deploy2002: kubectl delete job wikidata-resubmit-changes-for-dispatch-29415459 # T411862
- 11:42 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1229.eqiad.wmnet
- 11:26 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1233 gradually with 4 steps - Pool db1233.eqiad.wmnet in after cloning
- 10:41 fceratto@cumin1003: START - Cookbook sre.mysql.pool db1233 gradually with 4 steps - Pool db1233.eqiad.wmnet in after cloning
- 10:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 10:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 09:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 09:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 09:16 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1233 - Depool db1233.eqiad.wmnet to then clone it to db1229.eqiad.wmnet - fceratto@cumin1003
- 09:16 fceratto@cumin1003: START - Cookbook sre.mysql.depool db1233 - Depool db1233.eqiad.wmnet to then clone it to db1229.eqiad.wmnet - fceratto@cumin1003
- 09:16 fceratto@cumin1003: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1229.eqiad.wmnet
- 08:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 08:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 08:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/ferretdb-growthbook-next: apply
- 08:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/ferretdb-growthbook-next: apply
- 08:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook-next: apply
- 08:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook-next: apply
- 07:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-growthbook-next: apply
- 07:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-growthbook-next: apply
- 03:24 larssandergreen: Updating civicrm from 7a979750 to 9cc43ebd
- 03:08 larssandergreen: Updating civicrm from 36b09796 to 7a979750
- 02:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1175 (T410589)', diff saved to https://phabricator.wikimedia.org/P86417 and previous config saved to /var/cache/conftool/dbconfig/20251205-025711-ladsgroup.json
- 02:57 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 02:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T410589)', diff saved to https://phabricator.wikimedia.org/P86416 and previous config saved to /var/cache/conftool/dbconfig/20251205-025647-ladsgroup.json
- 02:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P86415 and previous config saved to /var/cache/conftool/dbconfig/20251205-024139-ladsgroup.json
- 02:40 ejegg: payments-wiki upgraded from 9ab44e85 to 5c381b45
- 02:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P86414 and previous config saved to /var/cache/conftool/dbconfig/20251205-022631-ladsgroup.json
- 02:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T410589)', diff saved to https://phabricator.wikimedia.org/P86413 and previous config saved to /var/cache/conftool/dbconfig/20251205-021123-ladsgroup.json
- 02:09 wfan: donorwiki upgraded from 053b3f88 to 9ab44e85
- 02:07 wfan: payments-wiki upgraded from d2799b95 to 9ab44e85
- 02:01 rzl: rzl@apt1002:~$ sudo -i reprepro -C component/envoy-future include bullseye-wikimedia /home/rzl/envoyproxy_1.35.7-1_amd64.changes
- 01:44 wfan: SmashPig upgraded from a25fbb28 to 1442d0a0
- 01:41 eileen: civicrm upgraded from d4bd9b1b to 36b09796
- 01:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 06s)
- 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:27 Amir1: ladsgroup@deploy2002:~$ mwscript-k8s --follow -- findBadBlobs.php --wiki huwikiquote --mark "Corrupted UTF-8 (T351953)" --revisions 3804,3808,3811,3813,3814,3818,3825
- 00:26 Amir1: ladsgroup@deploy2002:~$ mwscript-k8s --follow -- findBadBlobs.php --wiki guwiktionary --mark "Corrupted UTF-8 (T351953)" --revisions 20576
2025-12-04
- 23:47 tzatziki: removing 4 files for legal compliance
- 23:34 tzatziki: removing 2 files for legal compliance
- 23:23 tzatziki: removing 3 files for legal compliance
- 23:16 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
- 23:16 tzatziki: removing 5 files for legal compliance
- 23:04 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy7002.magru.wmnet
- 23:02 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy7001.magru.wmnet
- 23:00 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy7002.magru.wmnet
- 23:00 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy6002.drmrs.wmnet
- 22:59 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy6001.drmrs.wmnet
- 22:59 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy5002.eqsin.wmnet
- 22:58 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy7001.magru.wmnet
- 22:56 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy5001.eqsin.wmnet
- 22:56 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy6002.drmrs.wmnet
- 22:55 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy6001.drmrs.wmnet
- 22:55 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy5002.eqsin.wmnet
- 22:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy3002.esams.wmnet
- 22:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy4002.ulsfo.wmnet
- 22:52 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy3001.esams.wmnet
- 22:52 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy5001.eqsin.wmnet
- 22:51 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy2002.codfw.wmnet
- 22:51 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy4002.ulsfo.wmnet
- 22:51 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy3002.esams.wmnet
- 22:51 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy4001.ulsfo.wmnet
- 22:51 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy2001.codfw.wmnet
- 22:50 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy4001.ulsfo.wmnet
- 22:50 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host tcp-proxy1002.eqiad.wmnet
- 22:49 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy3001.esams.wmnet
- 22:48 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy2002.codfw.wmnet
- 22:47 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy2001.codfw.wmnet
- 22:46 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-single for host tcp-proxy1002.eqiad.wmnet
- 22:42 dzahn@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
- 22:42 dzahn@cumin2002: START - Cookbook sre.hosts.reboot-cluster
- 22:37 sbassett: Deployed security fix for T409226
- 22:35 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
- 22:28 sbassett: Deployed security fix for T408135
- 22:22 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 14 hosts with reason: T408532
- 22:20 ryankemper: T411568 Rebooting `stat*`
- 22:11 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on stat[1008-1011].eqiad.wmnet with reason: T411568
- 22:06 cscott@deploy2002: Finished scap sync-world: Backport for Activate postprocessing cache on testwiki, test2wiki, officewiki (T348255) (duration: 14m 23s)
- 22:02 cscott@deploy2002: ihurbain, cscott: Continuing with sync
- 21:54 cscott@deploy2002: ihurbain, cscott: Backport for Activate postprocessing cache on testwiki, test2wiki, officewiki (T348255) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:52 cscott@deploy2002: Started scap sync-world: Backport for Activate postprocessing cache on testwiki, test2wiki, officewiki (T348255)
- 21:45 jforrester@deploy2002: Finished scap sync-world: Backport for Followup Ie40b9e59a4: Fortify unified metrics method (T411793) (duration: 07m 16s)
- 21:40 jforrester@deploy2002: jforrester: Continuing with sync
- 21:40 jforrester@deploy2002: jforrester: Backport for Followup Ie40b9e59a4: Fortify unified metrics method (T411793) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:37 jforrester@deploy2002: Started scap sync-world: Backport for Followup Ie40b9e59a4: Fortify unified metrics method (T411793)
- 21:24 jforrester@deploy2002: Finished scap sync-world: Backport for [tokwiki] Allow sysops to grant/remove confirmed status (T411683), OATHAuth: Remove wmgOATHAuthDisableRight (T399664), Remove /data-parsoid/ endpoint from specs per T393557 (T411517), Shorten 'close' cookie wait period for enwiki banners (T411800) (duration: 10m 04s)
- 21:19 jforrester@deploy2002: mstyles, aaron, superpes, jforrester, ejegg: Continuing with sync
- 21:18 jforrester@deploy2002: mstyles, aaron, superpes, jforrester, ejegg: Backport for [tokwiki] Allow sysops to grant/remove confirmed status (T411683), OATHAuth: Remove wmgOATHAuthDisableRight (T399664), Remove /data-parsoid/ endpoint from specs per T393557 (T411517), Shorten 'close' cookie wait period for enwiki banners (T411800) synced to the t
- 21:14 jforrester@deploy2002: Started scap sync-world: Backport for [tokwiki] Allow sysops to grant/remove confirmed status (T411683), OATHAuth: Remove wmgOATHAuthDisableRight (T399664), Remove /data-parsoid/ endpoint from specs per T393557 (T411517), Shorten 'close' cookie wait period for enwiki banners (T411800)
- 21:11 kharlan@deploy2002: Finished scap sync-world: Backport for Use a separate right for Special:SuggestedInvestigations (T411557) (duration: 57m 45s)
- 21:03 brett: import varnishkafka 1.2.0~deb13+wmf1 into trixie-wikimedia - T401832
- 21:01 taavi@deploy2002: mwscript-k8s job started: initEditCount --wiki=tokwiki
- 20:58 kharlan@deploy2002: kharlan: Continuing with sync
- 20:57 kharlan@deploy2002: kharlan: Backport for Use a separate right for Special:SuggestedInvestigations (T411557) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:50 brett: import libvmod-wmfuniq 0.2.0~deb13+wmf1 into trixie-wikimedia - T401832
- 20:28 brett: Delete libvmod-netmapper 1.10-1~deb13+wmf1, import libvmod-netmapper 1.10~deb13+wmf1 into trixie-wikimedia - T401832
- 20:13 kharlan@deploy2002: Started scap sync-world: Backport for Use a separate right for Special:SuggestedInvestigations (T411557)
- 20:13 brett: import libvmod-querysort 0.4~deb13+wmf1 into trixie-wikimedia - T401832
- 20:05 cstone: payments-wiki upgraded from 714ed4cf to d2799b95
- 20:00 brett: import libvmod-netmapper 1.10-1~deb13+wmf1 into trixie-wikimedia - T401832
- 19:30 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Persist the captcha consequence in the user session (T410657) (duration: 11m 16s)
- 19:24 kharlan@deploy2002: kharlan: Continuing with sync
- 19:21 kharlan@deploy2002: kharlan: Backport for hCaptcha: Persist the captcha consequence in the user session (T410657) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:19 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Persist the captcha consequence in the user session (T410657)
- 19:13 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: apply
- 19:12 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: apply
- 18:50 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 18:50 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 18:46 cjming@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 18:45 cjming@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 18:22 ejegg: fundraising civicrm rolled back from 510ab862 to d4bd9b1b
- 18:21 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1019.eqiad.wmnet
- 18:21 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1019.eqiad.wmnet
- 18:09 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 18:09 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 18:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1019.eqiad.wmnet with OS bullseye
- 17:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1019.eqiad.wmnet with reason: host reimage
- 17:45 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1019.eqiad.wmnet with reason: host reimage
- 17:44 ejegg: fundraising civicrm upgraded from d4bd9b1b to 510ab862
- 17:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1019.eqiad.wmnet with OS bullseye
- 17:21 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host franio1004
- 17:21 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host franio1004
- 17:20 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:17 vriley@cumin1003: START - Cookbook sre.dns.netbox
- 17:06 topranks: disable BGP to lvs1019 on eqiad coure routers ahead of switch migration T405628
- 17:06 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1019.eqiad.wmnet with reason: move primary uplink from move primary uplink from asw2-c7-eqiad to lsw1-c7-eqiad and remove link to asw2-d2-eqiad - T405628
- 15:55 hashar@deploy2002: Finished deploy [gerrit/gerrit@121bd1c]: Remove duplicate [DISMISS] button (duration: 00m 11s)
- 15:55 hashar@deploy2002: Started deploy [gerrit/gerrit@121bd1c]: Remove duplicate [DISMISS] button
- 15:51 dpogorzelski@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-lab1001.eqiad.wmnet with reason: decomission
- 15:50 dpogorzelski@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on ml-lab1001.eqiad.wmnet with reason: decomission
- 15:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf2005.codfw.wmnet
- 15:45 bking@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host dse-k8s-worker2003.codfw.wmnet
- 15:45 bking@cumin2002: START - Cookbook sre.k8s.pool-depool-node pool for host dse-k8s-worker2003.codfw.wmnet
- 15:44 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host conf2005.codfw.wmnet
- 15:43 hashar@deploy2002: Finished deploy [gerrit/gerrit@774e2ff]: Ease configuration of the motd banner && Add banner for the 2025 developer survey (duration: 00m 15s)
- 15:43 hashar@deploy2002: Started deploy [gerrit/gerrit@774e2ff]: Ease configuration of the motd banner && Add banner for the 2025 developer survey
- 15:41 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf2004.codfw.wmnet
- 15:38 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 15:38 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 15:36 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host conf2004.codfw.wmnet
- 15:35 bking@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host dse-k8s-worker2003.codfw.wmnet
- 15:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf1009.eqiad.wmnet
- 15:30 bking@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host dse-k8s-worker2003.codfw.wmnet
- 15:28 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host conf1009.eqiad.wmnet
- 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf1008.eqiad.wmnet
- 15:20 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host conf1008.eqiad.wmnet
- 15:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf1007.eqiad.wmnet
- 15:09 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host conf1007.eqiad.wmnet
- 15:08 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:06 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 15:06 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 15:06 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 15:05 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:03 Lucas_WMDE: UTC afternoon backport+config window done
- 15:03 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:03 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 15:02 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 15:02 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 15:02 ladsgroup@deploy2002: Finished scap sync-world: Backport for RevisionStore: Catch ParameterAssertionException too (T351953) (duration: 09m 26s)
- 15:01 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 14:59 cgoubert@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 14:59 cgoubert@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 14:59 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 14:58 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 14:55 ladsgroup@deploy2002: jforrester, ladsgroup: Continuing with sync
- 14:54 ladsgroup@deploy2002: jforrester, ladsgroup: Backport for RevisionStore: Catch ParameterAssertionException too (T351953) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:52 ladsgroup@deploy2002: Started scap sync-world: Backport for RevisionStore: Catch ParameterAssertionException too (T351953)
- 14:50 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 14:49 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 14:37 derick@deploy2002: Finished scap sync-world: Backport for Revert "User: Log where the data was loaded when CAS update failed" (T410652), Revert "User: Log where the data was loaded when CAS update failed" (T410652), Fetch user object from primary DB (for writes) not replica DB (T410652) (duration: 13m 24s)
- 14:27 derick@deploy2002: d3r1ck01, derick: Continuing with sync
- 14:26 derick@deploy2002: d3r1ck01, derick: Backport for Revert "User: Log where the data was loaded when CAS update failed" (T410652), Revert "User: Log where the data was loaded when CAS update failed" (T410652), Fetch user object from primary DB (for writes) not replica DB (T410652) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes
- 14:23 derick@deploy2002: Started scap sync-world: Backport for Revert "User: Log where the data was loaded when CAS update failed" (T410652), Revert "User: Log where the data was loaded when CAS update failed" (T410652), Fetch user object from primary DB (for writes) not replica DB (T410652)
- 14:17 gehel@cumin2002: conftool action : set/weight=10; selector: service=druid-public-coordinator
- 14:17 gehel@cumin2002: conftool action : set/pooled=yes; selector: service=druid-public-coordinator
- 14:14 tchanders@deploy2002: Finished scap sync-world: Backport for Enable temporary accounts on enwikinews and ptwikibooks (T411618) (duration: 10m 36s)
- 14:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1166 (T410589)', diff saved to https://phabricator.wikimedia.org/P86406 and previous config saved to /var/cache/conftool/dbconfig/20251204-141124-ladsgroup.json
- 14:11 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 14:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T410589)', diff saved to https://phabricator.wikimedia.org/P86405 and previous config saved to /var/cache/conftool/dbconfig/20251204-141101-ladsgroup.json
- 14:08 tchanders@deploy2002: tchanders: Continuing with sync
- 14:06 tchanders@deploy2002: tchanders: Backport for Enable temporary accounts on enwikinews and ptwikibooks (T411618) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:03 tchanders@deploy2002: Started scap sync-world: Backport for Enable temporary accounts on enwikinews and ptwikibooks (T411618)
- 13:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P86404 and previous config saved to /var/cache/conftool/dbconfig/20251204-135554-ladsgroup.json
- 13:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P86403 and previous config saved to /var/cache/conftool/dbconfig/20251204-134046-ladsgroup.json
- 13:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T410589)', diff saved to https://phabricator.wikimedia.org/P86402 and previous config saved to /var/cache/conftool/dbconfig/20251204-132539-ladsgroup.json
- 13:22 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 13:22 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 13:19 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 13:19 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 13:16 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 13:15 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 13:15 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 13:14 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 13:07 moritzm: installing waitress security updates
- 12:45 moritzm: installing postgresql-15 security updates
- 11:31 moritzm: installing net-snmp security updates
- 11:21 moritzm: rebuild software RAIDs on T410743
- 11:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 10:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 09:48 moritzm: upgrade Envoy on an-launcher T405808
- 09:43 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.5 refs T408275
- 09:35 moritzm: cleanup lingering sessions of offboarded user T389324
- 09:30 hashar@deploy2002: Finished scap sync-world: Backport for REST: add explicit cast to sitemapSize calcuation to avoid warning (T411580), Followup I81a2c4de77: Verify stats label values are not empty (T411585) (duration: 09m 59s)
- 09:26 hashar@deploy2002: jforrester, hashar: Continuing with sync
- 09:23 hashar@deploy2002: jforrester, hashar: Backport for REST: add explicit cast to sitemapSize calcuation to avoid warning (T411580), Followup I81a2c4de77: Verify stats label values are not empty (T411585) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:22 arnoldokoth: upgrade envoyproxy on lists T405808
- 09:20 hashar@deploy2002: Started scap sync-world: Backport for REST: add explicit cast to sitemapSize calcuation to avoid warning (T411580), Followup I81a2c4de77: Verify stats label values are not empty (T411585)
- 09:20 arnoldokoth: upgrade envoyproxy on vrts T405808
- 09:19 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Arinaigum out of all services on: 2419 hosts
- 03:50 ejegg: fundraising civicrm upgraded from b1fc5afc to d4bd9b1b
- 01:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1157 (T410589)', diff saved to https://phabricator.wikimedia.org/P86394 and previous config saved to /var/cache/conftool/dbconfig/20251204-012321-ladsgroup.json
- 01:23 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 01:18 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 17m 47s)
- 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2025-12-03
- 23:08 Amir1: hard rebooting codesearch9.codesearch.eqiad1.wikimedia.cloud (T411728)
- 22:51 mutante: maintenance on https://codesearch.wmcloud.org/ - trying to fix disk space issue - detaching volume to extend it
- 22:50 mutante: maintenance on https://codesearch.wmcloud.org/ - trying to fix disk space issue
- 22:33 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 22:33 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 22:14 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 22:13 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 22:09 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 22:08 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 21:53 aaron@deploy2002: Finished scap sync-world: Backport for Update Math API title and project-specific /math/ endpoint stability policy (T411517) (duration: 08m 25s)
- 21:49 aaron@deploy2002: aaron: Continuing with sync
- 21:47 aaron@deploy2002: aaron: Backport for Update Math API title and project-specific /math/ endpoint stability policy (T411517) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:45 aaron@deploy2002: Started scap sync-world: Backport for Update Math API title and project-specific /math/ endpoint stability policy (T411517)
- 21:42 derick@deploy2002: Finished scap sync-world: Backport for User: Log where the data was loaded when CAS update failed (T410652), User: Log where the data was loaded when CAS update failed (T410652) (duration: 07m 33s)
- 21:38 derick@deploy2002: derick, d3r1ck01: Continuing with sync
- 21:37 derick@deploy2002: derick, d3r1ck01: Backport for User: Log where the data was loaded when CAS update failed (T410652), User: Log where the data was loaded when CAS update failed (T410652) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:35 derick@deploy2002: Started scap sync-world: Backport for User: Log where the data was loaded when CAS update failed (T410652), User: Log where the data was loaded when CAS update failed (T410652)
- 21:28 dani@deploy2002: Finished scap sync-world: Backport for Increase coverage of 2025 Global Readers Survey (non-enwiki) (T410918), OATHAuth: Expand 2FA to all users (T399664) (duration: 11m 18s)
- 21:24 dani@deploy2002: dani, mstyles: Continuing with sync
- 21:19 dani@deploy2002: dani, mstyles: Backport for Increase coverage of 2025 Global Readers Survey (non-enwiki) (T410918), OATHAuth: Expand 2FA to all users (T399664) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:17 dani@deploy2002: Started scap sync-world: Backport for Increase coverage of 2025 Global Readers Survey (non-enwiki) (T410918), OATHAuth: Expand 2FA to all users (T399664)
- 21:14 aude@deploy2002: Finished scap sync-world: Backport for [Legal Footer] Create config for adding legal footer (T410163) (duration: 08m 38s)
- 21:10 aude@deploy2002: aude, lmora: Continuing with sync
- 21:08 aude@deploy2002: aude, lmora: Backport for [Legal Footer] Create config for adding legal footer (T410163) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:05 aude@deploy2002: Started scap sync-world: Backport for [Legal Footer] Create config for adding legal footer (T410163)
- 20:53 aqu@deploy2002: Finished deploy [analytics/refinery@6dfb3b8] (thin): Deploy spur hqls THIN [analytics/refinery@6dfb3b8b] (duration: 01m 16s)
- 20:51 aqu@deploy2002: Started deploy [analytics/refinery@6dfb3b8] (thin): Deploy spur hqls THIN [analytics/refinery@6dfb3b8b]
- 20:51 aqu@deploy2002: Finished deploy [analytics/refinery@6dfb3b8]: Deploy spur hqls [analytics/refinery@6dfb3b8b] (duration: 02m 29s)
- 20:49 aqu@deploy2002: Started deploy [analytics/refinery@6dfb3b8]: Deploy spur hqls [analytics/refinery@6dfb3b8b]
- 20:48 aqu@deploy2002: Finished deploy [analytics/refinery@6dfb3b8] (hadoop-test): Deploy spur hqls TEST [analytics/refinery@6dfb3b8b] (duration: 01m 01s)
- 20:47 aqu@deploy2002: Started deploy [analytics/refinery@6dfb3b8] (hadoop-test): Deploy spur hqls TEST [analytics/refinery@6dfb3b8b]
- 20:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
- 20:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
- 20:43 brett@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
- 20:40 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1005.eqiad.wmnet with reason: host reimage
- 20:25 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1005.eqiad.wmnet with OS trixie
- 20:22 eileen: civicrm upgraded from 45931830 to b1fc5afc
- 20:02 ejegg: payments-wiki upgraded from eeadc2d8 to 714ed4cf
- 20:00 eileen: civicrm upgraded from c6d1f24b to 45931830
- 19:58 sukhe@dns1004: END - running authdns-update
- 19:57 sukhe@dns1004: START - running authdns-update
- 19:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T410589)', diff saved to https://phabricator.wikimedia.org/P86392 and previous config saved to /var/cache/conftool/dbconfig/20251203-195207-ladsgroup.json
- 19:51 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1020.eqiad.wmnet with OS bullseye
- 19:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P86390 and previous config saved to /var/cache/conftool/dbconfig/20251203-193659-ladsgroup.json
- 19:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1020.eqiad.wmnet with reason: host reimage
- 19:23 hashar@deploy2002: Finished deploy [gerrit/gerrit@93bde2a]: Ease configuration of the motd banner (duration: 00m 09s)
- 19:22 hashar@deploy2002: Started deploy [gerrit/gerrit@93bde2a]: Ease configuration of the motd banner
- 19:22 cmooney@cumin1003: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 19:22 cmooney@cumin1003: START - Cookbook sre.network.cf
- 19:22 topranks: disabling remote announcement of bgp prefixes
- 19:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229', diff saved to https://phabricator.wikimedia.org/P86388 and previous config saved to /var/cache/conftool/dbconfig/20251203-192152-ladsgroup.json
- 19:21 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1020.eqiad.wmnet with reason: host reimage
- 19:14 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices1006.eqiad.wmnet with OS trixie
- 19:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2229 (T410589)', diff saved to https://phabricator.wikimedia.org/P86387 and previous config saved to /var/cache/conftool/dbconfig/20251203-190644-ladsgroup.json
- 19:06 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1020.eqiad.wmnet with OS bullseye
- 18:37 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS trixie
- 18:26 ladsgroup@deploy2002: Finished scap sync-world: Backport for findBadBlobs: Fix the --scan-to option (T351953), findBadBlobs: Fix the --scan-to option (T351953) (duration: 06m 48s)
- 18:25 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1020.eqiad.wmnet with reason: move primary uplink from asw2-d7-eqiad to lsw1-d7-eqiad and remove link to asw2-c2-eqiad
- 18:22 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 18:22 ladsgroup@deploy2002: ladsgroup: Backport for findBadBlobs: Fix the --scan-to option (T351953), findBadBlobs: Fix the --scan-to option (T351953) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:19 ladsgroup@deploy2002: Started scap sync-world: Backport for findBadBlobs: Fix the --scan-to option (T351953), findBadBlobs: Fix the --scan-to option (T351953)
- 18:12 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
- 18:08 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices1006.eqiad.wmnet with reason: host reimage
- 18:05 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:05 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating for cloudceph to codfw - jhancock@cumin1003"
- 18:04 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating for cloudceph to codfw - jhancock@cumin1003"
- 18:01 jhancock@cumin1003: START - Cookbook sre.dns.netbox
- 18:01 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
- 17:57 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
- 17:50 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices1006.eqiad.wmnet with OS trixie
- 17:46 sukhe@cumin1003: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 17:46 sukhe@cumin1003: START - Cookbook sre.network.cf
- 17:46 sukhe@cumin1003: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
- 17:46 sukhe@cumin1003: START - Cookbook sre.network.cf
- 17:46 sukhe@cumin1003: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 17:46 sukhe@cumin1003: START - Cookbook sre.network.cf
- 17:46 sukhe@cumin1003: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 17:45 sukhe@cumin1003: START - Cookbook sre.network.cf
- 17:40 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS trixie
- 17:40 sbisson@deploy2002: Finished scap sync-world: Backport for CX3 Build 1.0.0+20251126 (T384485) (duration: 09m 07s)
- 17:36 sbisson@deploy2002: sbisson: Continuing with sync
- 17:34 sbisson@deploy2002: sbisson: Backport for CX3 Build 1.0.0+20251126 (T384485) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:31 sbisson@deploy2002: Started scap sync-world: Backport for CX3 Build 1.0.0+20251126 (T384485)
- 17:11 jynus@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1229.eqiad.wmnet with reason: crashed
- 17:07 jynus@cumin1003: dbctl commit (dc=all): 'Depooldb1229', diff saved to https://phabricator.wikimedia.org/P86383 and previous config saved to /var/cache/conftool/dbconfig/20251203-170745-jynus.json
- 17:02 bd808@deploy2002: Finished scap sync-world: Backport for robots.php: Fix undefined index 'enabled' on Wikinews and closed wikis (T411632) (duration: 07m 40s)
- 16:58 bd808@deploy2002: bd808, krinkle: Continuing with sync
- 16:57 bd808@deploy2002: bd808, krinkle: Backport for robots.php: Fix undefined index 'enabled' on Wikinews and closed wikis (T411632) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:54 bd808@deploy2002: Started scap sync-world: Backport for robots.php: Fix undefined index 'enabled' on Wikinews and closed wikis (T411632)
- 16:49 bd808@deploy2002: Finished scap sync-world: Backport for officewiki: Put indicators in title with vector-2022, officewiki: Enable page protection indicators (duration: 07m 47s)
- 16:45 bd808@deploy2002: bd808: Continuing with sync
- 16:44 bd808@deploy2002: bd808: Backport for officewiki: Put indicators in title with vector-2022, officewiki: Enable page protection indicators synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:41 bd808@deploy2002: Started scap sync-world: Backport for officewiki: Put indicators in title with vector-2022, officewiki: Enable page protection indicators
- 16:15 topranks: disabling unused former cloudcephosd hosts on cloud switches T410989
- 16:13 dancy@deploy2002: Installation of scap version "4.229.0" completed for 164 hosts
- 16:09 dancy@deploy2002: Installing scap version "4.229.0" for 164 host(s)
- 15:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host conf2006.codfw.wmnet
- 15:28 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:27 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:27 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:27 ladsgroup@deploy2002: Finished scap sync-world: Backport for Clean up db groups config (T411088) (duration: 07m 48s)
- 15:27 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:26 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host conf2006.codfw.wmnet
- 15:26 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:26 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:23 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 15:23 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:22 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:21 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:21 ladsgroup@deploy2002: ladsgroup: Backport for Clean up db groups config (T411088) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:21 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:20 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:20 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:19 ladsgroup@deploy2002: Started scap sync-world: Backport for Clean up db groups config (T411088)
- 15:16 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
- 15:16 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
- 15:15 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:15 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:14 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:13 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:12 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:12 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:09 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:08 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:08 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:07 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:06 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:06 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
- 15:06 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:04 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
- 15:03 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
- 15:00 robh@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on alert1002.wikimedia.org with reason: C/D Migration
- 15:00 robh: alert1002 port migration now starting
- 14:54 Lucas_WMDE: UTC afternoon backport+config window done
- 14:49 esanders@deploy2002: Finished scap sync-world: Backport for DiscussionTools: cleanup unused config, Remove wgVisualEditorEditCheckSingleCheckMode (duration: 06m 44s)
- 14:45 esanders@deploy2002: esanders: Continuing with sync
- 14:44 esanders@deploy2002: esanders: Backport for DiscussionTools: cleanup unused config, Remove wgVisualEditorEditCheckSingleCheckMode synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:42 esanders@deploy2002: Started scap sync-world: Backport for DiscussionTools: cleanup unused config, Remove wgVisualEditorEditCheckSingleCheckMode
- 14:38 esanders@deploy2002: Finished scap sync-world: Backport for Set Flow to read-only everywhere (T402552) (duration: 09m 44s)
- 14:33 esanders@deploy2002: esanders: Continuing with sync
- 14:31 esanders@deploy2002: esanders: Backport for Set Flow to read-only everywhere (T402552) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:29 esanders@deploy2002: Started scap sync-world: Backport for Set Flow to read-only everywhere (T402552)
- 14:27 XioNoX: push pfw policies - T411566
- 14:27 sbisson@deploy2002: Finished scap sync-world: Backport for CX3 Build 1.0.0+20251201 (T408842 T408844) (duration: 12m 01s)
- 14:21 sbisson@deploy2002: sbisson: Continuing with sync
- 14:17 sbisson@deploy2002: sbisson: Backport for CX3 Build 1.0.0+20251201 (T408842 T408844) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:15 sbisson@deploy2002: Started scap sync-world: Backport for CX3 Build 1.0.0+20251201 (T408842 T408844)
- 13:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance
- 13:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86380 and previous config saved to /var/cache/conftool/dbconfig/20251203-135000-marostegui.json
- 13:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P86379 and previous config saved to /var/cache/conftool/dbconfig/20251203-133452-marostegui.json
- 13:32 kart_: Updated Recommendation API to 2025-12-02-200719-production (T408845, T408844, T384485)
- 13:30 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:25 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:22 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P86378 and previous config saved to /var/cache/conftool/dbconfig/20251203-131945-marostegui.json
- 13:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2229 (T410589)', diff saved to https://phabricator.wikimedia.org/P86377 and previous config saved to /var/cache/conftool/dbconfig/20251203-131448-ladsgroup.json
- 13:14 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2229.codfw.wmnet with reason: Maintenance
- 13:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T410589)', diff saved to https://phabricator.wikimedia.org/P86376 and previous config saved to /var/cache/conftool/dbconfig/20251203-131435-ladsgroup.json
- 13:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86375 and previous config saved to /var/cache/conftool/dbconfig/20251203-130437-marostegui.json
- 13:01 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 13:00 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 13:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2227 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86374 and previous config saved to /var/cache/conftool/dbconfig/20251203-130002-marostegui.json
- 12:59 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance
- 12:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86373 and previous config saved to /var/cache/conftool/dbconfig/20251203-125938-marostegui.json
- 12:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P86372 and previous config saved to /var/cache/conftool/dbconfig/20251203-125927-ladsgroup.json
- 12:57 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:56 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:56 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 12:55 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 12:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 12:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 12:52 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:52 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 12:51 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 12:51 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:50 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:50 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:50 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:49 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P86371 and previous config saved to /var/cache/conftool/dbconfig/20251203-124430-marostegui.json
- 12:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P86370 and previous config saved to /var/cache/conftool/dbconfig/20251203-124419-ladsgroup.json
- 12:32 claime: Restarting failed timer dump_cloud_ip_ranges on puppetservers
- 12:30 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:30 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P86369 and previous config saved to /var/cache/conftool/dbconfig/20251203-122923-marostegui.json
- 12:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T410589)', diff saved to https://phabricator.wikimedia.org/P86368 and previous config saved to /var/cache/conftool/dbconfig/20251203-122912-ladsgroup.json
- 12:26 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:26 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:20 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
- 12:19 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
- 12:19 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:18 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:17 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86367 and previous config saved to /var/cache/conftool/dbconfig/20251203-121409-marostegui.json
- 12:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2209 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86366 and previous config saved to /var/cache/conftool/dbconfig/20251203-120933-marostegui.json
- 12:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 12:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86365 and previous config saved to /var/cache/conftool/dbconfig/20251203-120909-marostegui.json
- 11:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P86364 and previous config saved to /var/cache/conftool/dbconfig/20251203-115401-marostegui.json
- 11:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P86363 and previous config saved to /var/cache/conftool/dbconfig/20251203-113853-marostegui.json
- 11:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86362 and previous config saved to /var/cache/conftool/dbconfig/20251203-112345-marostegui.json
- 11:19 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2194 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86361 and previous config saved to /var/cache/conftool/dbconfig/20251203-111910-marostegui.json
- 11:19 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 11:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86360 and previous config saved to /var/cache/conftool/dbconfig/20251203-111846-marostegui.json
- 11:15 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:15 elukey@cumin1003: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 11:12 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host ml-serve1013
- 11:07 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host ml-serve1013
- 11:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P86359 and previous config saved to /var/cache/conftool/dbconfig/20251203-110338-marostegui.json
- 10:58 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host sretest2001
- 10:53 elukey@cumin1003: START - Cookbook sre.hosts.powercycle for host sretest2001
- 10:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P86358 and previous config saved to /var/cache/conftool/dbconfig/20251203-104830-marostegui.json
- 10:35 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptchaEditAttempt logging: Normalize line endings (T411578), hCaptchaEditAttempt logging: Normalize line endings (T411578) (duration: 07m 56s)
- 10:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86357 and previous config saved to /var/cache/conftool/dbconfig/20251203-103323-marostegui.json
- 10:30 kharlan@deploy2002: kharlan: Continuing with sync
- 10:29 kharlan@deploy2002: kharlan: Backport for hCaptchaEditAttempt logging: Normalize line endings (T411578), hCaptchaEditAttempt logging: Normalize line endings (T411578) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 10:27 kharlan@deploy2002: Started scap sync-world: Backport for hCaptchaEditAttempt logging: Normalize line endings (T411578), hCaptchaEditAttempt logging: Normalize line endings (T411578)
- 09:19 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.5 refs T408275
- 09:14 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:14 ayounsi@cumin1003: START - Cookbook sre.hosts.provision for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 09:00 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ganeti-test2001.codfw.wmnet with reason: test CR1207804
- 08:37 moritzm: upgrade Envoy on schema* T405808
- 08:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 08:13 moritzm: installing python-zipp security updates
- 07:47 moritzm: installing libtpms security updates
- 07:26 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1169 gradually with 4 steps - Repooling db1169
- 07:12 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 07:05 moritzm: installing mako security updates
- 07:01 Amir1: ladsgroup@deploy2002:~$ mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 11 rememberpassword (T406724)
- 06:56 Amir1: ladsgroup@deploy2002:~$ mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 11 popups (T406724)
- 06:40 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1169 gradually with 4 steps - Repooling db1169
- 06:39 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1169 gradually with 4 steps - Repooling db1169
- 06:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2224 (T410589)', diff saved to https://phabricator.wikimedia.org/P86350 and previous config saved to /var/cache/conftool/dbconfig/20251203-063812-ladsgroup.json
- 06:38 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 06:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T410589)', diff saved to https://phabricator.wikimedia.org/P86349 and previous config saved to /var/cache/conftool/dbconfig/20251203-063749-ladsgroup.json
- 06:35 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1169 gradually with 4 steps - Repooling db1169
- 06:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1169 - Depooling db1169
- 06:29 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1169 - Depooling db1169
- 06:26 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1169.eqiad.wmnet with OS trixie
- 06:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P86348 and previous config saved to /var/cache/conftool/dbconfig/20251203-062241-ladsgroup.json
- 06:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
- 06:15 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
- 06:15 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
- 06:15 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
- 06:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P86345 and previous config saved to /var/cache/conftool/dbconfig/20251203-060734-ladsgroup.json
- 06:05 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: host reimage
- 05:58 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db1169.eqiad.wmnet with reason: host reimage
- 05:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T410589)', diff saved to https://phabricator.wikimedia.org/P86344 and previous config saved to /var/cache/conftool/dbconfig/20251203-055226-ladsgroup.json
- 05:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2190 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86343 and previous config saved to /var/cache/conftool/dbconfig/20251203-054438-marostegui.json
- 05:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 05:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86342 and previous config saved to /var/cache/conftool/dbconfig/20251203-054414-marostegui.json
- 05:41 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS trixie
- 05:36 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1011.eqiad.wmnet with OS trixie
- 05:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P86341 and previous config saved to /var/cache/conftool/dbconfig/20251203-052906-marostegui.json
- 05:27 marostegui: Drop sockpuppet database T411527
- 05:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P86340 and previous config saved to /var/cache/conftool/dbconfig/20251203-051359-marostegui.json
- 04:59 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1011.eqiad.wmnet with reason: host reimage
- 04:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86339 and previous config saved to /var/cache/conftool/dbconfig/20251203-045851-marostegui.json
- 04:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 04:55 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1011.eqiad.wmnet with reason: host reimage
- 04:34 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1011.eqiad.wmnet with OS trixie
- 04:26 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1007.eqiad.wmnet with OS trixie
- 03:50 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
- 03:46 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1007.eqiad.wmnet with reason: host reimage
- 03:30 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1007.eqiad.wmnet with OS trixie
- 03:26 krinkle@deploy2002: Finished scap sync-world: Backport for robots.php: Avoid "404 Not Found" for Sitemap rule (T400023) (duration: 11m 08s)
- 03:22 krinkle@deploy2002: krinkle: Continuing with sync
- 03:17 krinkle@deploy2002: krinkle: Backport for robots.php: Avoid "404 Not Found" for Sitemap rule (T400023) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 03:15 krinkle@deploy2002: Started scap sync-world: Backport for robots.php: Avoid "404 Not Found" for Sitemap rule (T400023)
- 03:08 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1006.eqiad.wmnet with OS trixie
- 03:08 krinkle@deploy2002: Finished scap sync-world: Backport for robots.php: Clean up unused site, lang, and x-subdomain (T407122), Submit Commons sitemap to Bing/DuckDuckGo and remaining wikis to Google (T400023), robots.txt: Clean up inline comments, robots.txt: Remove redundant "/wiki/Fundraising_2007/comments" disallow (duration: 08m 26s)
- 03:03 krinkle@deploy2002: krinkle: Continuing with sync
- 03:02 krinkle@deploy2002: krinkle: Backport for robots.php: Clean up unused site, lang, and x-subdomain (T407122), Submit Commons sitemap to Bing/DuckDuckGo and remaining wikis to Google (T400023), robots.txt: Clean up inline comments, robots.txt: Remove redundant "/wiki/Fundraising_2007/comments" disallow synced to the testservers (see https://wiki
- 02:59 krinkle@deploy2002: Started scap sync-world: Backport for robots.php: Clean up unused site, lang, and x-subdomain (T407122), Submit Commons sitemap to Bing/DuckDuckGo and remaining wikis to Google (T400023), robots.txt: Clean up inline comments, robots.txt: Remove redundant "/wiki/Fundraising_2007/comments" disallow
- 02:34 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
- 02:27 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1006.eqiad.wmnet with reason: host reimage
- 02:13 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.eqiad.wmnet with OS trixie
- 02:05 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1006.eqiad.wmnet with OS trixie
- 01:50 eileen: civicrm upgraded from ef0b2676 to c6d1f24b
- 01:23 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 01:21 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 01:18 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 01:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 30s)
- 01:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:50 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol1006.eqiad.wmnet with OS trixie
- 00:33 zabe@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.5 refs T408275
- 00:24 zabe@deploy2002: Finished scap sync-world: Backport for Close klwiki (T411501) (duration: 07m 29s)
- 00:20 zabe@deploy2002: zabe: Continuing with sync
- 00:19 zabe@deploy2002: zabe: Backport for Close klwiki (T411501) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:17 zabe@deploy2002: Started scap sync-world: Backport for Close klwiki (T411501)
- 00:09 zabe@deploy2002: Finished scap sync-world: Backport for Close crwiki (T411501) (duration: 07m 59s)
- 00:05 zabe@deploy2002: zabe: Continuing with sync
- 00:04 zabe@deploy2002: zabe: Backport for Close crwiki (T411501) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2217 (T410589)', diff saved to https://phabricator.wikimedia.org/P86338 and previous config saved to /var/cache/conftool/dbconfig/20251203-000140-ladsgroup.json
- 00:01 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 00:01 zabe@deploy2002: Started scap sync-world: Backport for Close crwiki (T411501)
2025-12-02
- 23:43 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2177 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86337 and previous config saved to /var/cache/conftool/dbconfig/20251202-234356-marostegui.json
- 23:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 23:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86336 and previous config saved to /var/cache/conftool/dbconfig/20251202-234332-marostegui.json
- 23:41 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1002.eqiad.wmnet with OS trixie
- 23:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P86335 and previous config saved to /var/cache/conftool/dbconfig/20251202-232824-marostegui.json
- 23:23 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1002.eqiad.wmnet with reason: host reimage
- 23:23 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:23 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: move IPv6 gerrit-lb to IPs ending in ::2 T365259 - dzahn@cumin2002"
- 23:22 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: move IPv6 gerrit-lb to IPs ending in ::2 T365259 - dzahn@cumin2002"
- 23:17 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1002.eqiad.wmnet with reason: host reimage
- 23:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P86334 and previous config saved to /var/cache/conftool/dbconfig/20251202-231317-marostegui.json
- 23:09 eileen: civicrm upgraded from 8d8400e1 to ef0b2676
- 23:02 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1001.eqiad.wmnet with OS trixie
- 23:01 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudrabbit1002.eqiad.wmnet with OS trixie
- 23:00 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit1003.eqiad.wmnet with OS trixie
- 22:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86333 and previous config saved to /var/cache/conftool/dbconfig/20251202-225809-marostegui.json
- 22:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86332 and previous config saved to /var/cache/conftool/dbconfig/20251202-225122-marostegui.json
- 22:45 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1001.eqiad.wmnet with reason: host reimage
- 22:42 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit1003.eqiad.wmnet with reason: host reimage
- 22:41 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 22:39 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1001.eqiad.wmnet with reason: host reimage
- 22:38 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit1003.eqiad.wmnet with reason: host reimage
- 22:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P86331 and previous config saved to /var/cache/conftool/dbconfig/20251202-223615-marostegui.json
- 22:33 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 22:32 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 22:25 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudrabbit1001.eqiad.wmnet with OS trixie
- 22:23 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudrabbit1003.eqiad.wmnet with OS trixie
- 22:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P86330 and previous config saved to /var/cache/conftool/dbconfig/20251202-222107-marostegui.json
- 22:20 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS trixie
- 22:09 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1054.eqiad.wmnet with OS trixie
- 22:09 catrope@deploy2002: Finished scap sync-world: Backport for CentralAuthUser: Add debugging information for T385310 (T385310), CentralAuthUser: Add debugging information for T385310 (T385310) (duration: 07m 29s)
- 22:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86329 and previous config saved to /var/cache/conftool/dbconfig/20251202-220600-marostegui.json
- 22:05 catrope@deploy2002: catrope, matmarex: Continuing with sync
- 22:04 catrope@deploy2002: catrope, matmarex: Backport for CentralAuthUser: Add debugging information for T385310 (T385310), CentralAuthUser: Add debugging information for T385310 (T385310) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:01 catrope@deploy2002: Started scap sync-world: Backport for CentralAuthUser: Add debugging information for T385310 (T385310), CentralAuthUser: Add debugging information for T385310 (T385310)
- 21:55 dani@deploy2002: Finished scap sync-world: Backport for [beta] Undeploy experiment for 2025 Global Readers Survey (T410696), Deploy 2025 Global Readers Survey (non-enwiki) (T410918) (duration: 10m 23s)
- 21:51 dani@deploy2002: dani: Continuing with sync
- 21:47 dani@deploy2002: dani: Backport for [beta] Undeploy experiment for 2025 Global Readers Survey (T410696), Deploy 2025 Global Readers Survey (non-enwiki) (T410918) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:44 dani@deploy2002: Started scap sync-world: Backport for [beta] Undeploy experiment for 2025 Global Readers Survey (T410696), Deploy 2025 Global Readers Survey (non-enwiki) (T410918)
- 21:43 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 21:43 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 21:42 kgraessle@deploy2002: Finished scap sync-world: Backport for Enable revertrisk filters in thwiki (T409438) (duration: 10m 34s)
- 21:38 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2013']
- 21:38 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2013']
- 21:38 kgraessle@deploy2002: kgraessle: Continuing with sync
- 21:36 kgraessle@deploy2002: kgraessle: Backport for Enable revertrisk filters in thwiki (T409438) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 21:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 21:33 bking@dns1004: END - running authdns-update
- 21:32 bking@dns1004: START - running authdns-update
- 21:31 kgraessle@deploy2002: Started scap sync-world: Backport for Enable revertrisk filters in thwiki (T409438)
- 21:31 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 21:29 kharlan@deploy2002: Finished scap sync-world: Backport for Refactor: Move editing session ID logic into service (T406865), hCaptcha: Log diff when challenge is presented (T406865) (duration: 59m 06s)
- 21:26 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1050.eqiad.wmnet with OS trixie
- 21:20 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
- 21:17 kharlan@deploy2002: kharlan: Continuing with sync
- 21:16 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
- 21:15 kharlan@deploy2002: kharlan: Backport for Refactor: Move editing session ID logic into service (T406865), hCaptcha: Log diff when challenge is presented (T406865) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:14 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:14 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP - IPv6 - for ulsfo and magru T365259 - dzahn@cumin2002"
- 21:14 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP - IPv6 - for ulsfo and magru T365259 - dzahn@cumin2002"
- 21:10 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 21:04 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:03 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP - IPv6 - for drmrs, eqsin and esams T365259 - dzahn@cumin2002"
- 21:03 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP - IPv6 - for drmrs, eqsin and esams T365259 - dzahn@cumin2002"
- 21:00 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS trixie
- 21:00 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS trixie
- 20:58 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 20:52 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:52 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP - IPv6 - for codfw and eqiad T365259 - dzahn@cumin2002"
- 20:52 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP - IPv6 - for codfw and eqiad T365259 - dzahn@cumin2002"
- 20:48 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
- 20:48 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 20:44 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
- 20:43 eileen: civicrm upgraded from c90bd037 to 8d8400e1
- 20:38 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
- 20:37 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:37 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP for magru and eqiad T365259 - dzahn@cumin2002"
- 20:37 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP for magru and eqiad T365259 - dzahn@cumin2002"
- 20:34 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
- 20:33 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 20:31 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:31 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP for drmrs and eqsin T365259 - dzahn@cumin2002"
- 20:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP for drmrs and eqsin T365259 - dzahn@cumin2002"
- 20:30 kharlan@deploy2002: Started scap sync-world: Backport for Refactor: Move editing session ID logic into service (T406865), hCaptcha: Log diff when challenge is presented (T406865)
- 20:29 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
- 20:28 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS trixie
- 20:26 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 20:26 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
- 20:18 jhathaway@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1005.eqiad.wmnet with OS bookworm
- 20:18 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:18 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP for esams and ulsfo T365259 - dzahn@cumin2002"
- 20:18 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb VIP for esams and ulsfo T365259 - dzahn@cumin2002"
- 20:13 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 20:12 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1050.eqiad.wmnet with OS trixie
- 20:09 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS trixie
- 19:58 jhathaway@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 19:53 jhathaway@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 19:52 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:49 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 19:48 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:48 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb.codfw.wikimedia.org T365259 - dzahn@cumin2002"
- 19:46 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added gerrit-lb.codfw.wikimedia.org T365259 - dzahn@cumin2002"
- 19:43 cstone: payments-wiki upgraded from 6d39e545 to eeadc2d8
- 19:42 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 19:34 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 19:15 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
- 19:15 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
- 19:07 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs3010*} and A:liberica
- 19:03 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs3010*} and A:liberica
- 18:53 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1-eqiad (T352245)
- 18:53 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1-eqiad (T352245)
- 18:52 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet with OS trixie
- 18:47 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic2-eqiad (T352245)
- 18:47 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic2-eqiad (T352245)
- 18:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
- 18:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
- 18:41 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T352245)
- 18:40 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T352245)
- 18:36 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T352245)
- 18:36 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T352245)
- 18:32 Emperor: repool ms-fe2014 T410959
- 18:27 swfrench@deploy2002: Unlocked for deployment [MediaWiki]: Hold deployments during etcd certificate change - T352245 (duration: 17m 35s)
- 18:26 swfrench-wmf: restarted navtiming on webperf1003 - T352245
- 18:23 swfrench-wmf: begin rolling restarts of eqiad-associated confds - T352245
- 18:22 swfrench-wmf: migrating etcd to PKI certs on conf1007 - T352245
- 18:19 swfrench-wmf: deleted EtcdReplicationDown silence (42a82757-2075-44fd-b057-ec9ed2afeb90) - T352245
- 18:16 swfrench-wmf: manually transferred etcd replication source back to conf1009 - T352245
- 18:15 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
- 18:15 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
- 18:12 swfrench-wmf: migrating etcd to PKI certs on conf1009 - T352245
- 18:10 swfrench@deploy2002: Locking from deployment [MediaWiki]: Hold deployments during etcd certificate change - T352245
- 18:08 jhathaway@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1005.eqiad.wmnet with OS bookworm
- 18:06 rzl@deploy2002: Finished scap sync-world: https://gerrit.wikimedia.org/r/1208442 T407553 (duration: 06m 36s)
- 18:04 swfrench-wmf: manually transferred codfw etcd replication source to conf1008 - T352245
- 18:02 rzl@deploy2002: rzl: Continuing with sync
- 18:01 rzl@deploy2002: rzl: https://gerrit.wikimedia.org/r/1208442 T407553 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:01 swfrench-wmf: silenced EtcdReplicationDown (42a82757-2075-44fd-b057-ec9ed2afeb90) - T352245
- 18:00 rzl@deploy2002: Started scap sync-world: https://gerrit.wikimedia.org/r/1208442 T407553
- 17:48 jhathaway@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 17:47 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1212 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86328 and previous config saved to /var/cache/conftool/dbconfig/20251202-174732-marostegui.json
- 17:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 6 hosts with reason: Maintenance
- 17:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 17:44 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1251.eqiad.wmnet onto db1169.eqiad.wmnet
- 17:43 jhathaway@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 17:42 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2156 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86327 and previous config saved to /var/cache/conftool/dbconfig/20251202-174249-marostegui.json
- 17:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 17:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86326 and previous config saved to /var/cache/conftool/dbconfig/20251202-174225-marostegui.json
- 17:29 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
- 17:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P86325 and previous config saved to /var/cache/conftool/dbconfig/20251202-172717-marostegui.json
- 17:24 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 17:22 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudbackup1001-dev.eqiad.wmnet with reason: host reimage
- 17:21 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 17:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T410589)', diff saved to https://phabricator.wikimedia.org/P86324 and previous config saved to /var/cache/conftool/dbconfig/20251202-172134-ladsgroup.json
- 17:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P86323 and previous config saved to /var/cache/conftool/dbconfig/20251202-171210-marostegui.json
- 17:10 jhathaway@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1005.eqiad.wmnet with OS bookworm
- 17:09 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudbackup1001-dev.eqiad.wmnet with OS trixie
- 17:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P86322 and previous config saved to /var/cache/conftool/dbconfig/20251202-170627-ladsgroup.json
- 17:06 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1-eqiad (T352245)
- 17:05 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1-eqiad (T352245)
- 17:03 brett: import varnish-modules 0.20.0-2~deb13+wmf1 into trixie-wikimedia - T401832
- 17:02 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic2-eqiad (T352245)
- 17:01 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic2-eqiad (T352245)
- 16:59 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T352245)
- 16:58 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T352245)
- 16:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86321 and previous config saved to /var/cache/conftool/dbconfig/20251202-165702-marostegui.json
- 16:54 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 16:53 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad (T352245)
- 16:53 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad (T352245)
- 16:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P86320 and previous config saved to /var/cache/conftool/dbconfig/20251202-165119-ladsgroup.json
- 16:51 jhathaway@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS bookworm
- 16:44 ihurbain@deploy2002: Finished scap sync-world: Backport for Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960), Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960) (duration: 09m 21s)
- 16:43 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 16:43 inflatador: bking@wmf3062 restart WDQS codfw to resolve lag/possible deadlocks
- 16:39 ihurbain@deploy2002: ihurbain: Continuing with sync
- 16:39 jhathaway@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1005.eqiad.wmnet with OS bookworm
- 16:38 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
- 16:38 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
- 16:37 ihurbain@deploy2002: ihurbain: Backport for Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960), Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T410589)', diff saved to https://phabricator.wikimedia.org/P86319 and previous config saved to /var/cache/conftool/dbconfig/20251202-163612-ladsgroup.json
- 16:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1251 gradually with 4 steps - Pool db1251.eqiad.wmnet in after cloning
- 16:35 ihurbain@deploy2002: Started scap sync-world: Backport for Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960), Bump parsoid to v0.23.0-a7.1 on wmf.4 (T411238 T410960)
- 16:30 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 16:27 brett: import varnish 7.1.1-2~bpo13+wmf2 into trixie-wikimedia - T401832
- 16:24 jhathaway@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS bookworm
- 16:23 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 16:20 jhathaway@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS bookworm
- 16:19 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 16:18 swfrench-wmf: restarted navtiming on webperf1003 - T352245
- 16:14 swfrench-wmf: begin rolling restarts of eqiad-associated confds - T352245
- 16:12 moritzm: installing nodejs security updates
- 16:12 swfrench@deploy2002: Unlocked for deployment [MediaWiki]: Hold deployments during etcd certificate change - T352245 (duration: 03m 45s)
- 16:12 jhathaway@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1005.eqiad.wmnet with OS bookworm
- 16:10 jhathaway@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 16:08 swfrench@deploy2002: Locking from deployment [MediaWiki]: Hold deployments during etcd certificate change - T352245
- 16:08 swfrench-wmf: migrating etcd to PKI certs on conf1008 - T352245
- 16:08 jhathaway@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 16:02 moritzm: installing libsndfile security updates
- 16:01 jhathaway@cumin1003: START - Cookbook sre.hosts.provision for host sretest1005.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 16:00 gehel: restarting wdqs@codfw - system overloaded
- 15:58 jhathaway@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on sretest1005.eqiad.wmnet with reason: ipxe
- 15:50 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1251 gradually with 4 steps - Pool db1251.eqiad.wmnet in after cloning
- 15:48 moritzm: upgrade Envoy on Yarn T405808
- 15:45 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1088.eqiad.wmnet with OS bullseye
- 15:29 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
- 15:26 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
- 15:13 moritzm: upgrade Envoy on Turnilo T405808
- 15:12 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1088.eqiad.wmnet with OS bullseye
- 14:51 Lucas_WMDE: UTC afternoon backport+config window done
- 14:47 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] Enable Add Link for 3 wikis (T407818) (duration: 07m 46s)
- 14:43 urbanecm@deploy2002: urbanecm: Continuing with sync
- 14:41 urbanecm@deploy2002: urbanecm: Backport for [Growth] Enable Add Link for 3 wikis (T407818) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1198 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86314 and previous config saved to /var/cache/conftool/dbconfig/20251202-144148-marostegui.json
- 14:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 14:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86313 and previous config saved to /var/cache/conftool/dbconfig/20251202-144123-marostegui.json
- 14:39 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] Enable Add Link for 3 wikis (T407818)
- 14:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:30 derick@deploy2002: Finished scap sync-world: Backport for user: Mark users created with User::addToDatabase() as primary (T410652) (duration: 08m 34s)
- 14:28 ayounsi@cumin1003: START - Cookbook sre.hosts.provision for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P86312 and previous config saved to /var/cache/conftool/dbconfig/20251202-142616-marostegui.json
- 14:26 derick@deploy2002: d3r1ck01, derick: Continuing with sync
- 14:25 derick@deploy2002: d3r1ck01, derick: Backport for user: Mark users created with User::addToDatabase() as primary (T410652) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:21 derick@deploy2002: Started scap sync-world: Backport for user: Mark users created with User::addToDatabase() as primary (T410652)
- 14:21 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:18 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Growth: Enable Revise Tone feature on pilot wikis (T409606) (duration: 13m 03s)
- 14:14 ayounsi@cumin1003: START - Cookbook sre.hosts.provision for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:13 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, migr: Continuing with sync
- 14:12 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:11 ayounsi@cumin1003: START - Cookbook sre.hosts.provision for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P86311 and previous config saved to /var/cache/conftool/dbconfig/20251202-141108-marostegui.json
- 14:11 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ganeti-test2001.codfw.wmnet with reason: test CR1207804
- 14:10 jgleeson: payments-wiki upgraded from b405d6db to 6d39e545
- 14:07 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, migr: Backport for Growth: Enable Revise Tone feature on pilot wikis (T409606) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:05 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Growth: Enable Revise Tone feature on pilot wikis (T409606)
- 13:58 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1251 - Depool db1251.eqiad.wmnet to then clone it to db1169.eqiad.wmnet - marostegui@cumin1003
- 13:58 marostegui@cumin1003: START - Cookbook sre.mysql.depool db1251 - Depool db1251.eqiad.wmnet to then clone it to db1169.eqiad.wmnet - marostegui@cumin1003
- 13:58 marostegui@cumin1003: START - Cookbook sre.mysql.clone of db1251.eqiad.wmnet onto db1169.eqiad.wmnet
- 13:57 dbrant@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 13:56 dbrant@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 13:56 dbrant@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 13:56 dbrant@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 13:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86309 and previous config saved to /var/cache/conftool/dbconfig/20251202-135600-marostegui.json
- 13:55 dbrant@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 13:54 dbrant@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 13:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1169.eqiad.wmnet with OS bookworm
- 13:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/analytics-test: apply
- 13:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/analytics-test: apply
- 13:04 brouberol: running rebalancing of kafka-main-codfw with throttle of 30MB/s - T407185
- 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1169.eqiad.wmnet with reason: host reimage
- 12:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1169.eqiad.wmnet with reason: host reimage
- 12:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2193 (T410589)', diff saved to https://phabricator.wikimedia.org/P86308 and previous config saved to /var/cache/conftool/dbconfig/20251202-124632-ladsgroup.json
- 12:46 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 12:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T410589)', diff saved to https://phabricator.wikimedia.org/P86307 and previous config saved to /var/cache/conftool/dbconfig/20251202-124609-ladsgroup.json
- 12:43 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bookworm
- 12:41 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1169.eqiad.wmnet with OS bookworm
- 12:40 kharlan@deploy2002: Finished scap sync-world: Backport for SI: Skip successfuledit event for null edits (T410280), SI: Skip successfuledit event for null edits (T410280) (duration: 06m 39s)
- 12:36 kharlan@deploy2002: kharlan: Continuing with sync
- 12:35 kharlan@deploy2002: kharlan: Backport for SI: Skip successfuledit event for null edits (T410280), SI: Skip successfuledit event for null edits (T410280) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:33 kharlan@deploy2002: Started scap sync-world: Backport for SI: Skip successfuledit event for null edits (T410280), SI: Skip successfuledit event for null edits (T410280)
- 12:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P86305 and previous config saved to /var/cache/conftool/dbconfig/20251202-123102-ladsgroup.json
- 12:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host db1169.eqiad.wmnet with OS bookworm
- 12:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P86304 and previous config saved to /var/cache/conftool/dbconfig/20251202-121554-ladsgroup.json
- 12:04 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:04 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T410589)', diff saved to https://phabricator.wikimedia.org/P86303 and previous config saved to /var/cache/conftool/dbconfig/20251202-120046-ladsgroup.json
- 11:57 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 11:56 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 11:44 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 11:44 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 11:41 kharlan@deploy2002: Finished scap sync-world: Backport for wgAutoConfirmCount: Raise value to 10 for frwiki, idwiki, trwiki (T411263) (duration: 08m 28s)
- 11:37 Emperor: rebuild RAID on ms-fe2014 T410959
- 11:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2149 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86302 and previous config saved to /var/cache/conftool/dbconfig/20251202-113625-marostegui.json
- 11:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 11:35 kharlan@deploy2002: kharlan: Continuing with sync
- 11:34 kharlan@deploy2002: kharlan: Backport for wgAutoConfirmCount: Raise value to 10 for frwiki, idwiki, trwiki (T411263) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:32 kharlan@deploy2002: Started scap sync-world: Backport for wgAutoConfirmCount: Raise value to 10 for frwiki, idwiki, trwiki (T411263)
- 11:16 kharlan@deploy2002: Finished scap sync-world: Backport for hCaptcha: Switch frwiki to 99.9% passive mode (T405586), hCaptcha: Enable hCaptcha editing in 100% passive mode on enwiki (T405586) (duration: 08m 55s)
- 11:12 kharlan@deploy2002: kharlan: Continuing with sync
- 11:10 kharlan@deploy2002: kharlan: Backport for hCaptcha: Switch frwiki to 99.9% passive mode (T405586), hCaptcha: Enable hCaptcha editing in 100% passive mode on enwiki (T405586) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:07 kharlan@deploy2002: Started scap sync-world: Backport for hCaptcha: Switch frwiki to 99.9% passive mode (T405586), hCaptcha: Enable hCaptcha editing in 100% passive mode on enwiki (T405586)
- 10:51 moritzm: rebuild software raid following disk swap on bast2003 T410195
- 10:41 bwojtowicz@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 10:38 elukey: upgrade spicerack to 12.1.0 on all cumin hosts
- 10:36 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest1005.eqiad.wmnet
- 10:36 kharlan@deploy2002: Finished scap sync-world: Backport for UserInfoCard: Hide activity graph when it's likely to be inaccurate (T400409) (duration: 10m 26s)
- 10:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 10:33 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 10:33 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 10:32 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 10:32 kharlan@deploy2002: kharlan: Continuing with sync
- 10:31 bwojtowicz@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 10:29 bwojtowicz@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 10:27 kharlan@deploy2002: kharlan: Backport for UserInfoCard: Hide activity graph when it's likely to be inaccurate (T400409) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 10:25 kharlan@deploy2002: Started scap sync-world: Backport for UserInfoCard: Hide activity graph when it's likely to be inaccurate (T400409)
- 10:23 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2223 gradually with 4 steps - After switchover
- 10:21 kharlan@deploy2002: Finished scap sync-world: Backport for Allow similar signals to be merged into an existing case (T410303) (duration: 07m 52s)
- 10:17 kharlan@deploy2002: kharlan: Continuing with sync
- 10:15 kharlan@deploy2002: kharlan: Backport for Allow similar signals to be merged into an existing case (T410303) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 10:13 kharlan@deploy2002: Started scap sync-world: Backport for Allow similar signals to be merged into an existing case (T410303)
- 10:05 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 10:04 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 09:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 09:53 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2223 gradually with 4 steps - After switchover
- 09:53 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db2223 gradually with 4 steps - After switchover
- 09:52 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2223 gradually with 4 steps - After switchover
- 09:50 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 09:50 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 09:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86298 and previous config saved to /var/cache/conftool/dbconfig/20251202-094931-marostegui.json
- 09:46 elukey: uploaded spicerack_12.1.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
- 09:43 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 09:43 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 09:43 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 09:42 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 09:41 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 09:41 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 09:38 moritzm: upgrade Envoy on parsoidtest/testreduce T405808
- 09:09 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.5 refs T408275
- 09:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1189 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86297 and previous config saved to /var/cache/conftool/dbconfig/20251202-090932-marostegui.json
- 09:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 09:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86296 and previous config saved to /var/cache/conftool/dbconfig/20251202-090908-marostegui.json
- 09:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2223 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86295 and previous config saved to /var/cache/conftool/dbconfig/20251202-090334-marostegui.json
- 09:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2223.codfw.wmnet with reason: Maintenance
- 09:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86294 and previous config saved to /var/cache/conftool/dbconfig/20251202-090321-marostegui.json
- 08:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P86293 and previous config saved to /var/cache/conftool/dbconfig/20251202-085401-marostegui.json
- 08:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P86292 and previous config saved to /var/cache/conftool/dbconfig/20251202-084813-marostegui.json
- 08:40 gehel: restarting wdqs@codfw - system overloaded
- 08:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P86291 and previous config saved to /var/cache/conftool/dbconfig/20251202-083853-marostegui.json
- 08:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P86290 and previous config saved to /var/cache/conftool/dbconfig/20251202-083306-marostegui.json
- 08:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86289 and previous config saved to /var/cache/conftool/dbconfig/20251202-082345-marostegui.json
- 08:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86288 and previous config saved to /var/cache/conftool/dbconfig/20251202-081758-marostegui.json
- 08:17 dcausse: closing the utc morning backport window
- 08:14 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: enable georgian transliteration second try profile (T408737) (duration: 10m 00s)
- 08:09 dcausse@deploy2002: dcausse: Continuing with sync
- 08:06 dcausse@deploy2002: dcausse: Backport for cirrus: enable georgian transliteration second try profile (T408737) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:04 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: enable georgian transliteration second try profile (T408737)
- 07:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2223.codfw.wmnet with reason: Schema change
- 07:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2180 (T410589)', diff saved to https://phabricator.wikimedia.org/P86287 and previous config saved to /var/cache/conftool/dbconfig/20251202-073553-ladsgroup.json
- 07:35 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 07:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T410589)', diff saved to https://phabricator.wikimedia.org/P86286 and previous config saved to /var/cache/conftool/dbconfig/20251202-073530-ladsgroup.json
- 07:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P86285 and previous config saved to /var/cache/conftool/dbconfig/20251202-072022-ladsgroup.json
- 07:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P86284 and previous config saved to /var/cache/conftool/dbconfig/20251202-070514-ladsgroup.json
- 06:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T410589)', diff saved to https://phabricator.wikimedia.org/P86283 and previous config saved to /var/cache/conftool/dbconfig/20251202-065007-ladsgroup.json
- 06:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2228.codfw.wmnet with reason: Schema change
- 05:59 kart_: Updated cxserver to 2025-12-02-041957-production + Yandex key removal from production config
- 05:59 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 05:57 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 05:52 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 05:52 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 05:50 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 05:49 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 05:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2213 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86282 and previous config saved to /var/cache/conftool/dbconfig/20251202-052010-marostegui.json
- 05:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2213.codfw.wmnet with reason: Maintenance
- 05:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86281 and previous config saved to /var/cache/conftool/dbconfig/20251202-051947-marostegui.json
- 05:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P86280 and previous config saved to /var/cache/conftool/dbconfig/20251202-050439-marostegui.json
- 05:02 mwpresync@deploy2002: Pruned MediaWiki: 1.46.0-wmf.2 (duration: 02m 56s)
- 04:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P86279 and previous config saved to /var/cache/conftool/dbconfig/20251202-044931-marostegui.json
- 04:48 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.46.0-wmf.5 refs T408275 (duration: 44m 45s)
- 04:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86278 and previous config saved to /var/cache/conftool/dbconfig/20251202-043424-marostegui.json
- 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.46.0-wmf.5 refs T408275
- 03:52 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1175 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86277 and previous config saved to /var/cache/conftool/dbconfig/20251202-035202-marostegui.json
- 03:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 03:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86276 and previous config saved to /var/cache/conftool/dbconfig/20251202-035138-marostegui.json
- 03:43 cstone: payments-wiki upgraded from c1b83aa2 to b405d6db
- 03:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P86275 and previous config saved to /var/cache/conftool/dbconfig/20251202-033630-marostegui.json
- 03:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P86274 and previous config saved to /var/cache/conftool/dbconfig/20251202-032122-marostegui.json
- 03:15 mutante: vrts1003 - compressed /opt/znuny-6.5.16 and .17 to .tar.gz files - then deleted uncompressed versions - freeing about 700k inodes (T411452)
- 03:14 mutante: vrts1003 - sudo -u otrs ./bin/otrs.Console.pl Maint::Cache::Delete (T411452)
- 03:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86273 and previous config saved to /var/cache/conftool/dbconfig/20251202-030615-marostegui.json
- 01:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2211 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86272 and previous config saved to /var/cache/conftool/dbconfig/20251202-013635-marostegui.json
- 01:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
- 00:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2169 (T410589)', diff saved to https://phabricator.wikimedia.org/P86271 and previous config saved to /var/cache/conftool/dbconfig/20251202-000540-ladsgroup.json
- 00:05 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 00:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T410589)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20251202-000512-ladsgroup.json
2025-12-01
- 23:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P86269 and previous config saved to /var/cache/conftool/dbconfig/20251201-235004-ladsgroup.json
- 23:45 catrope@deploy2002: Finished scap sync-world: Backport for Make sure WebAuthnKey::$supportsPasswordless is always initialized (T411368) (duration: 07m 36s)
- 23:41 catrope@deploy2002: catrope: Continuing with sync
- 23:39 catrope@deploy2002: catrope: Backport for Make sure WebAuthnKey::$supportsPasswordless is always initialized (T411368) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:38 catrope@deploy2002: Started scap sync-world: Backport for Make sure WebAuthnKey::$supportsPasswordless is always initialized (T411368)
- 23:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P86268 and previous config saved to /var/cache/conftool/dbconfig/20251201-233456-ladsgroup.json
- 23:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T410589)', diff saved to https://phabricator.wikimedia.org/P86267 and previous config saved to /var/cache/conftool/dbconfig/20251201-231949-ladsgroup.json
- 22:50 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid-test: apply
- 22:40 logmsgbot: mstyles Deployed security patch for T411144
- 22:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
- 22:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86266 and previous config saved to /var/cache/conftool/dbconfig/20251201-222810-marostegui.json
- 22:26 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1166 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86265 and previous config saved to /var/cache/conftool/dbconfig/20251201-222607-marostegui.json
- 22:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 22:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86264 and previous config saved to /var/cache/conftool/dbconfig/20251201-222544-marostegui.json
- 22:20 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on zuul2002.codfw.wmnet with reason: reboot
- 22:13 larssandergreen: civicrm upgraded from ee12d616 to c90bd037
- 22:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P86263 and previous config saved to /var/cache/conftool/dbconfig/20251201-221302-marostegui.json
- 22:11 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host planet1004.eqiad.wmnet
- 22:11 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host planet1004.eqiad.wmnet with OS trixie
- 22:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P86262 and previous config saved to /var/cache/conftool/dbconfig/20251201-221036-marostegui.json
- 21:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P86261 and previous config saved to /var/cache/conftool/dbconfig/20251201-215754-marostegui.json
- 21:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on planet1004.eqiad.wmnet with reason: host reimage
- 21:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P86260 and previous config saved to /var/cache/conftool/dbconfig/20251201-215529-marostegui.json
- 21:52 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on planet1004.eqiad.wmnet with reason: host reimage
- 21:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86259 and previous config saved to /var/cache/conftool/dbconfig/20251201-214247-marostegui.json
- 21:42 dzahn@cumin2002: START - Cookbook sre.hosts.reimage for host planet1004.eqiad.wmnet with OS trixie
- 21:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86258 and previous config saved to /var/cache/conftool/dbconfig/20251201-214021-marostegui.json
- 21:37 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM planet1004.eqiad.wmnet - dzahn@cumin2002"
- 21:37 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM planet1004.eqiad.wmnet - dzahn@cumin2002"
- 21:36 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) planet1004.eqiad.wmnet on all recursors
- 21:36 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache planet1004.eqiad.wmnet on all recursors
- 21:36 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:36 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM planet1004.eqiad.wmnet - dzahn@cumin2002"
- 21:36 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM planet1004.eqiad.wmnet - dzahn@cumin2002"
- 21:36 bvibber@deploy2002: Finished scap sync-world: Backport for StickyHeaders: fix Minerva list styling for "peeking" bullet points (T409325) (duration: 07m 08s)
- 21:32 bvibber@deploy2002: bvibber: Continuing with sync
- 21:31 eileen: civicrm upgraded from 37ddffc2 to ee12d616
- 21:31 bvibber@deploy2002: bvibber: Backport for StickyHeaders: fix Minerva list styling for "peeking" bullet points (T409325) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:29 dzahn@cumin2002: START - Cookbook sre.dns.netbox
- 21:29 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host planet1004.eqiad.wmnet
- 21:29 bvibber@deploy2002: Started scap sync-world: Backport for StickyHeaders: fix Minerva list styling for "peeking" bullet points (T409325)
- 21:25 cscott@deploy2002: Finished scap sync-world: Backport for Deploy Parsoid Read Views to 19 wikis (T411283), Change the README to Markdown, noc: Point links in /conf to Gitiles rather than Differential, REST: enable the site.v1 module (T409516), cirrus: Apply increased near match weight on commonswiki (T408154) (duration: 12m
- 21:21 cscott@deploy2002: cscott, ebernhardson, tgr, arlolra, bpirkle: Continuing with sync
- {{safesubst:SAL entry|1=21:17 cscott@deploy2002: cscott, ebernhardson, tgr, arlolra, bpirkle: Backport for Deploy Parsoid Read Views to 19 wikis (T411283), Change the README to Markdown, noc: Point links in /conf to Gitiles rather than Differential, REST: enable the site.v1 module (T409516), [[gerrit:1213559|cirrus: Apply increased near match weight on commonswiki (T408154}}
- 21:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 21:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 21:13 cscott@deploy2002: Started scap sync-world: Backport for Deploy Parsoid Read Views to 19 wikis (T411283), Change the README to Markdown, noc: Point links in /conf to Gitiles rather than Differential, REST: enable the site.v1 module (T409516), cirrus: Apply increased near match weight on commonswiki (T408154)
- 21:03 ejegg: payments-wiki upgraded from bb179e9c to c1b83aa2
- 20:57 urbanecm@deploy2002: Finished scap sync-world: Backport for Introduce HTML confirmation email (T396155), ConfirmEmailHooks: Do not run when UserEmailConfirmationUseHTML is true (T396155) (duration: 36m 09s)
- 20:51 herron: prometheus100[78] grow /dev/vg0/prometheus-k8s-dse filesystems
- 20:44 urbanecm@deploy2002: urbanecm: Continuing with sync
- 20:44 urbanecm@deploy2002: urbanecm: Backport for Introduce HTML confirmation email (T396155), ConfirmEmailHooks: Do not run when UserEmailConfirmationUseHTML is true (T396155) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:37 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 20:26 jhathaway@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
- 20:20 urbanecm@deploy2002: Started scap sync-world: Backport for Introduce HTML confirmation email (T396155), ConfirmEmailHooks: Do not run when UserEmailConfirmationUseHTML is true (T396155)
- 20:13 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on sretest2001.codfw.wmnet with reason: T383173
- 20:10 taavi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad
- 20:09 taavi@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad
- 20:08 taavi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad
- 20:08 mutante: upgrading envoyproxy on contint1002; phab1004; T405808
- 20:04 taavi@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad
- 20:04 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2178 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86256 and previous config saved to /var/cache/conftool/dbconfig/20251201-200359-marostegui.json
- 20:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 20:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86255 and previous config saved to /var/cache/conftool/dbconfig/20251201-200335-marostegui.json
- 20:02 mutante: updating envoyproxy from 1.29.x to 1.32.x on phabricator prod host
- 19:49 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs6003*} and A:liberica
- 19:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P86254 and previous config saved to /var/cache/conftool/dbconfig/20251201-194828-marostegui.json
- 19:46 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6003*} and A:liberica
- 19:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P86253 and previous config saved to /var/cache/conftool/dbconfig/20251201-193320-marostegui.json
- 19:28 cdobbins@cumin2002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) rebooting P{lvs6003*} and A:liberica
- 19:25 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6003*} and A:liberica
- 19:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86252 and previous config saved to /var/cache/conftool/dbconfig/20251201-191812-marostegui.json
- 19:14 cdobbins@cumin2002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) rebooting P{lvs6003*} and A:liberica
- 19:11 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6003*} and A:liberica
- 19:03 cdobbins@cumin2002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) rebooting P{lvs6003*} and A:liberica
- 19:00 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6003*} and A:liberica
- 18:44 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1003.wikimedia.org with OS trixie
- 18:24 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
- 18:18 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1003.wikimedia.org with reason: host reimage
- 18:05 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudweb1003.wikimedia.org with OS trixie
- 18:03 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 18:02 taavi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad
- 18:01 taavi@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad
- 18:00 taavi@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad
- 17:59 taavi@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad
- 17:56 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:45 taavi@cumin1003: conftool action : set/pooled=no; selector: cluster=cloudweb,name=cloudweb1003.wikimedia.org
- 17:43 taavi@cumin1003: conftool action : set/pooled=inactive; selector: cluster=cloudweb,name=cloudweb1003.wikimedia.org
- 17:39 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudweb1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 17:39 bd808@deploy2002: Finished scap sync-world: Backport for labswiki: Enable sitenotice on mobile (T410702) (duration: 06m 49s)
- 17:39 tappof: "thanos-store: set cutoff days to 1" reverted on titan2001 (4/4) T410152
- 17:35 bd808@deploy2002: bd808: Continuing with sync
- 17:34 bd808@deploy2002: bd808: Backport for labswiki: Enable sitenotice on mobile (T410702) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:32 bd808@deploy2002: Started scap sync-world: Backport for labswiki: Enable sitenotice on mobile (T410702)
- 17:32 andrew@cumin2002: START - Cookbook sre.hosts.provision for host cloudweb1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 17:31 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudweb1004.wikimedia.org with OS trixie
- 17:17 tappof: "thanos-store: set cutoff days to 1" reverted on titan2002 (3/4) T410152
- 17:08 hnowlan@deploy2002: Finished deploy [restbase/deploy@19cb647]: Add new wikis to restbase T408352 T408344 (duration: 16m 16s)
- 16:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1157 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86251 and previous config saved to /var/cache/conftool/dbconfig/20251201-165902-marostegui.json
- 16:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 16:58 cdobbins@cumin2002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) rebooting P{lvs6003*} and A:liberica
- 16:55 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs6003*} and A:liberica
- 16:52 hnowlan@deploy2002: Started deploy [restbase/deploy@19cb647]: Add new wikis to restbase T408352 T408344
- 16:48 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
- 16:43 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudweb1004.wikimedia.org with reason: host reimage
- 16:31 Emperor: depool ms-fe2014 for disk swap T410959
- 16:31 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS trixie
- 16:30 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudweb1004.wikimedia.org with OS trixie
- 16:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2158 (T410589)', diff saved to https://phabricator.wikimedia.org/P86250 and previous config saved to /var/cache/conftool/dbconfig/20251201-162923-ladsgroup.json
- 16:29 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 16:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T410589)', diff saved to https://phabricator.wikimedia.org/P86249 and previous config saved to /var/cache/conftool/dbconfig/20251201-162900-ladsgroup.json
- 16:28 tappof: "thanos-store: set cutoff days to 1" reverted on titan1002 (2/4) T410152
- 16:20 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1187 gradually with 4 steps - After schema change
- 16:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P86247 and previous config saved to /var/cache/conftool/dbconfig/20251201-161352-ladsgroup.json
- 16:00 taavi@dns1004: END - running authdns-update
- 15:59 taavi@dns1004: START - running authdns-update
- 15:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P86245 and previous config saved to /var/cache/conftool/dbconfig/20251201-155844-ladsgroup.json
- 15:56 tappof: "thanos-store: set cutoff days to 1" reverted on titan1001 (1/4) T410152
- 15:56 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS trixie
- 15:56 tappof: "thanos-store: set cutoff days to 1" reverted on titan1001 (1/4)
- 15:56 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2171 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86244 and previous config saved to /var/cache/conftool/dbconfig/20251201-155606-marostegui.json
- 15:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 15:55 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudweb1004.wikimedia.org with OS trixie
- 15:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86243 and previous config saved to /var/cache/conftool/dbconfig/20251201-155542-marostegui.json
- 15:50 inflatador: bking@wmf3062 restart wdqs codfw for high lag https://docs.google.com/spreadsheets/d/1UaabYlqj37EEaLAkrRArn4yNuNviGObgsGTfquIIHAQ/edit?gid=0#gid=0
- 15:50 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudweb1004.wikimedia.org with OS trixie
- 15:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1032.eqiad.wmnet with OS bookworm
- 15:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T410589)', diff saved to https://phabricator.wikimedia.org/P86241 and previous config saved to /var/cache/conftool/dbconfig/20251201-154337-ladsgroup.json
- 15:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P86240 and previous config saved to /var/cache/conftool/dbconfig/20251201-154035-marostegui.json
- 15:34 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1187 gradually with 4 steps - After schema change
- 15:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P86238 and previous config saved to /var/cache/conftool/dbconfig/20251201-152527-marostegui.json
- 15:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1032.eqiad.wmnet with reason: host reimage
- 15:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 15:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 15:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1032.eqiad.wmnet with reason: host reimage
- 15:19 sukhe@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2043.codfw.wmnet with OS trixie
- 15:15 Lucas_WMDE: UTC afternoon backport+config window done
- 15:12 kharlan@deploy2002: Finished scap sync-world: Backport for EventLogging: Register mediawiki.hcaptcha.edit stream (T406865), Set new $wgRateLimits config for edit attempt log (T406865) (duration: 11m 03s)
- 15:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86237 and previous config saved to /var/cache/conftool/dbconfig/20251201-151019-marostegui.json
- 15:07 kharlan@deploy2002: kharlan, sguebo: Continuing with sync
- 15:03 kharlan@deploy2002: kharlan, sguebo: Backport for EventLogging: Register mediawiki.hcaptcha.edit stream (T406865), Set new $wgRateLimits config for edit attempt log (T406865) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:03 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudweb1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 15:01 kharlan@deploy2002: Started scap sync-world: Backport for EventLogging: Register mediawiki.hcaptcha.edit stream (T406865), Set new $wgRateLimits config for edit attempt log (T406865)
- 14:59 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1032.eqiad.wmnet with OS bookworm
- 14:55 andrew@cumin2002: START - Cookbook sre.hosts.provision for host cloudweb1004.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 14:54 esanders@deploy2002: Finished scap sync-world: Backport for FlowMoveBoardsToSubpages: Add 'title' option for moving a specific board (T402552) (duration: 06m 31s)
- 14:50 esanders@deploy2002: esanders: Continuing with sync
- 14:49 esanders@deploy2002: esanders: Backport for FlowMoveBoardsToSubpages: Add 'title' option for moving a specific board (T402552) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:47 esanders@deploy2002: Started scap sync-world: Backport for FlowMoveBoardsToSubpages: Add 'title' option for moving a specific board (T402552)
- 14:46 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for CentralAuthUser: Cache getLocalGroups() (T410878) (duration: 14m 51s)
- 14:42 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Continuing with sync
- 14:37 slyngshede@dns1004: END - running authdns-update
- 14:36 slyngshede@dns1004: START - running authdns-update
- 14:33 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Backport for CentralAuthUser: Cache getLocalGroups() (T410878) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:31 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for CentralAuthUser: Cache getLocalGroups() (T410878)
- 14:30 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Api: Initialise reference variable (T411075) (duration: 07m 04s)
- 14:28 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS trixie
- 14:26 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Continuing with sync
- 14:25 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, matmarex: Backport for Api: Initialise reference variable (T411075) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:23 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Api: Initialise reference variable (T411075)
- 14:17 mfossati@deploy2002: Finished scap sync-world: Backport for ReaderExperiments' StickyHeaders stream configuration (T410533) (duration: 11m 51s)
- 14:11 mfossati@deploy2002: mfossati: Continuing with sync
- 14:09 mfossati@deploy2002: mfossati: Backport for ReaderExperiments' StickyHeaders stream configuration (T410533) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:05 mfossati@deploy2002: Started scap sync-world: Backport for ReaderExperiments' StickyHeaders stream configuration (T410533)
- 13:43 dcausse: T408431: reindexing all wikis in codfw
- 13:42 moritzm: upgrade Envoy on deployment servers T405808
- 13:16 moritzm: imported rancid 3.13-2+wmf12u1 for bookworm-wikimedia and 3.14-1+wmf13u1 for trixie-wikimedia T410606
- 12:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host ml-serve1013
- 12:53 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host ml-serve1013
- 12:47 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1013.eqiad.wmnet with OS trixie
- 11:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 11:53 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage
- 11:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2157 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86236 and previous config saved to /var/cache/conftool/dbconfig/20251201-114902-marostegui.json
- 11:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 11:47 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage
- 11:36 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1013.eqiad.wmnet with OS trixie
- 11:29 btullis: restarting envoyproxy process on cephosd100[1-5] for T405808
- 11:28 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1013.eqiad.wmnet with OS trixie
- 11:09 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1010.eqiad.wmnet
- 11:03 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1010.eqiad.wmnet
- 11:02 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1013.eqiad.wmnet with OS trixie
- 10:52 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.powercycle (exit_code=0) for host ml-serve1013
- 10:51 JavierMonton: Deployed refinery using scap, then deployed onto hdfs
- 10:47 moritzm: upgrade Envoy on matomo1001 T405808
- 10:47 elukey@cumin2002: START - Cookbook sre.hosts.powercycle for host ml-serve1013
- 10:46 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:46 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
- 10:42 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1013.eqiad.wmnet with OS trixie
- 10:40 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1013.eqiad.wmnet with OS trixie
- 10:39 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1013.eqiad.wmnet with OS trixie
- 10:23 javiermonton@deploy2002: Finished deploy [analytics/refinery@fa63f82]: Regular analytics train [analytics/refinery@fa63f82e] (duration: 00m 28s)
- 10:23 javiermonton@deploy2002: Started deploy [analytics/refinery@fa63f82]: Regular analytics train [analytics/refinery@fa63f82e]
- 10:20 a-pizzata@deploy2002: Finished deploy [analytics/refinery@fa63f82]: Regular analytics train [analytics/refinery@fa63f82e] (duration: 02m 54s)
- 10:17 a-pizzata@deploy2002: Started deploy [analytics/refinery@fa63f82]: Regular analytics train [analytics/refinery@fa63f82e]
- 10:16 a-pizzata@deploy2002: Finished deploy [analytics/refinery@fa63f82] (hadoop-test): Analytics train TEST [analytics/refinery@fa63f82e] (duration: 01m 08s)
- 10:15 a-pizzata@deploy2002: Started deploy [analytics/refinery@fa63f82] (hadoop-test): Analytics train TEST [analytics/refinery@fa63f82e]
- 10:14 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1013.eqiad.wmnet with OS trixie
- 10:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
- 10:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
- 10:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
- 10:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
- 10:11 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:11 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change ml-serve1013 vlan - ayounsi@cumin1003"
- 10:11 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change ml-serve1013 vlan - ayounsi@cumin1003"
- 10:04 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=0) rolling restart_daemons on A:logstash-collector
- 09:53 taavi@dns1004: END - running authdns-update
- 09:53 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
- 09:52 taavi@dns1004: START - running authdns-update
- 09:39 moritzm: installing expat security updates
- 09:15 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 09:14 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 08:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2151 (T410589)', diff saved to https://phabricator.wikimedia.org/P86235 and previous config saved to /var/cache/conftool/dbconfig/20251201-085828-ladsgroup.json
- 08:58 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 08:50 moritzm: upgrade Envoy on config-master* T405808
- 08:33 mszwarc@deploy2002: Finished scap sync-world: Backport for Fix mw-userlink class being added too broadly (T392775) (duration: 38m 35s)
- 08:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 08:29 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 08:19 mszwarc@deploy2002: mszwarc: Continuing with sync
- 08:19 brouberol@dns1004: END - running authdns-update
- 08:18 mszwarc@deploy2002: mszwarc: Backport for Fix mw-userlink class being added too broadly (T392775) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:18 brouberol@dns1004: START - running authdns-update
- 07:55 mszwarc@deploy2002: Started scap sync-world: Backport for Fix mw-userlink class being added too broadly (T392775)
- 06:47 eileen: civicrm upgraded from 1fc76c13 to 37ddffc2
- 06:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 05:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 05:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 04:53 eileen: civicrm upgraded from 6c200f91 to 1fc76c13
- 03:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 03:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86234 and previous config saved to /var/cache/conftool/dbconfig/20251201-033910-marostegui.json
- 03:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P86233 and previous config saved to /var/cache/conftool/dbconfig/20251201-032402-marostegui.json
- 03:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P86232 and previous config saved to /var/cache/conftool/dbconfig/20251201-030855-marostegui.json
- 02:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86231 and previous config saved to /var/cache/conftool/dbconfig/20251201-025347-marostegui.json
- 01:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1230 (T411163 T411164)', diff saved to https://phabricator.wikimedia.org/P86230 and previous config saved to /var/cache/conftool/dbconfig/20251201-012716-marostegui.json
- 01:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 01:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 34s)
- 01:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:22 eileen: civicrm upgraded from 4437a5ef to 6c200f91